skip to main content
research-article

Requirements for Data Quality Metrics

Published:22 January 2018Publication History
Skip Abstract Section

Abstract

Data quality and especially the assessment of data quality have been intensively discussed in research and practice alike. To support an economically oriented management of data quality and decision making under uncertainty, it is essential to assess the data quality level by means of well-founded metrics. However, if not adequately defined, these metrics can lead to wrong decisions and economic losses. Therefore, based on a decision-oriented framework, we present a set of five requirements for data quality metrics. These requirements are relevant for a metric that aims to support an economically oriented management of data quality and decision making under uncertainty. We further demonstrate the applicability and efficacy of these requirements by evaluating five data quality metrics for different data quality dimensions. Moreover, we discuss practical implications when applying the presented requirements.

References

  1. R. Agrawal, T. Imieliński, and A. Swami. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD/PODS’93), P. Buneman and S. Jajodia (Eds.). ACM Press, New York, 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal and R. Srikant. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94), J. B. Bocca, M. Jarke, and C. Zaniolo (Eds.). Morgan Kaufmann Publishers, San Francisco, CA, 487--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Allen and D. Cervo. 2015. Multi-Domain Master Data Management. Advanced MDM and Data Governance in Practice. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. J. Allen and W. M. Yen. 2002. Introduction to Measurement Theory. Waveland Press, Long Grove, IL.Google ScholarGoogle Scholar
  5. P. Alpar and S. Winkelsträter. 2014. Assessment of data quality in accounting data with association rules. Expert Systems with Applications 41, 5, 2259--2268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Azuma. 2001. SQuaRE: The next generation of the ISO/IEC 9126 and 14598 international standards series on software product quality. In European Software Control and Metrics Conference (ESCOM’01), 337--346.Google ScholarGoogle Scholar
  7. D. Ballou, R. Wang, H. Pazer, and G. K. Tayi. 1998. Modeling information manufacturing systems to determine information product quality. Management Science 44, 4, 462--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Batini and M. Scannapieco. 2006. Data quality: Concepts, Methodologies and Techniques. Springer, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Batini and M. Scannapieco. 2016. Data quality dimensions. In Data and Information Quality. Springer, 21--51.Google ScholarGoogle Scholar
  10. R. Blake and P. Mangiameli. 2011. The effects and interactions of data quality and problem complexity on classification. Journal of Data and Information Quality (JDIQ) 2, 2, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. C. Briand, S. Morasca, and V. R. Basili. 1996. Property-based software engineering measurement. IEEE Transactions on Software Engineering 22, 1, 68--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. U. Buhl, M. Röglinger, F. Moser, and J. Heidemann. 2013. Big data. A fashionable topic with(out) sustainable relevance for research and practice? Business 8 Information Systems Engineering 5, 2, 65--69.Google ScholarGoogle Scholar
  13. Bureau International des Poids et Mesures. 2006. The International System of Units (SI). National Institute of Standards and Technology, Paris.Google ScholarGoogle Scholar
  14. L. Cai and Y. Zhu. 2015. The challenges of data quality and data quality assessment in the big data era. Data Science Journal 14, 2 (2015), 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  15. Y. Cai and M. Ziad. 2003. Evaluating completeness of an information product. In Americas Conference on Information Systems (AMCIS’03). 2273--2281.Google ScholarGoogle Scholar
  16. J. Campanella. 1999. Principles of Quality Costs: Principles, Implementation and Use. ASQ Quality Press, Milwaukee.Google ScholarGoogle Scholar
  17. C. Cappiello and M. Comuzzi. 2009. A utility-based model to define the optimal data quality level in IT service offerings. In European Conference on Information Systems (ECIS’09).Google ScholarGoogle Scholar
  18. C. Cappiello, T. Di Noia, B. A. Marcu, and M. Matera. 2016. A quality model for linked data exploration. In International Conference on Web Engineering (ICWE’16). 397--404.Google ScholarGoogle Scholar
  19. P. Cozby and S. Bates. 2012. Methods in Behavioral Research. McGraw-Hill Higher Education, New York.Google ScholarGoogle Scholar
  20. J. Debattista, S. Auer, and C. Lange. 2016. Luzzu—a methodology and framework for linked data quality assessment. Journal of Data and Information Quality (JDIQ) 8, 1, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Driankov, H. Hellendoorn, and M. Reinfrank. 1996. An Introduction to Fuzzy Control. Springer, Berlin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. J. Eppler. 2003. Managing Information Quality: Increasing the Value of Information in Knowledge-Intensive Products and Processes. Springer, Berlin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Even and G. Shankaranarayanan. 2007. Utility-driven assessment of data quality. Database for Advances in Information Systems 38, 2, 75--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Even, G. Shankaranarayanan, and P. D. Berger. 2010. Evaluating a model for cost-effective data quality management in a real-world CRM setting. Decision Support Systems 50, 1, 152--163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Experian Information Solutions. 2016. Building a Business Case for Data Quality. Retrieved July 19, 2017, from https://www.edq.com/globalassets/white-papers/building-a-business-case-for-data-quality-report.pdf.Google ScholarGoogle Scholar
  26. W. Fan. 2015. Data quality. from theory to practice. SIGMOD Record 44, 3, 7--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. V. Feigenbaum. 2004. Total Quality Control. McGraw-Hill Professional New York.Google ScholarGoogle Scholar
  28. C. W. Fisher, I. Chengalur-Smith, and D. P. Ballou. 2003. The impact of experience and time on the use of data quality information in decision making. Information Systems Research 14, 2, 170--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. W. Fisher, E. J. M. Lauria, and C. C. Matheus. 2009. An accuracy metric: Percentages, randomness, and probabilities. Journal of Data and Information Quality (JDIQ) 1, 3, 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Flood, H. V. Jagadish, and L. Raschid. 2016. Big data challenges and opportunities in financial stability monitoring. Banque de France, Financial Stability Review 20.Google ScholarGoogle Scholar
  31. Forbes Insights. 2017. The Data Differentiator. How Improving Data Quality Improves Business. Forbes Media, New York.Google ScholarGoogle Scholar
  32. B. Heinrich and D. Hristova. 2014. A fuzzy metric for currency in the context of big data. In European Conference on Information Systems (ECIS’04).Google ScholarGoogle Scholar
  33. B. Heinrich and D. Hristova. 2016. A quantitative approach for modelling the influence of currency of information on decision-making under uncertainty. Journal of Decision Systems 25, 1, 16--41.Google ScholarGoogle ScholarCross RefCross Ref
  34. B. Heinrich, M. Kaiser, and M. Klier. 2007. How to measure data quality? A metric-based approach. In International Conference on Information Systems (ICIS’07).Google ScholarGoogle Scholar
  35. B. Heinrich and M. Klier. 2011. Assessing data currency-a probabilistic approach. Journal of Information Science 37, 1, 86--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. B. Heinrich and M. Klier. 2015. Metric-based data quality assessment—Developing and evaluating a probability-based currency metric. Decision Support Systems 72, 82--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. B. Heinrich, M. Klier, and Q. Görz. 2012. Data quality assessment: a metric-based approach to quantify the currency of data in information systems. Z Betriebswirtsch 82, 11, 1193--1228 (in German).Google ScholarGoogle ScholarCross RefCross Ref
  38. B. Heinrich, M. Klier, and M. Kaiser. 2009. A procedure to develop metrics for currency and its application in CRM. Journal of Data and Information Quality (JDIQ) 1, 1, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. H. Hinrichs. 2002. Datenqualitätsmanagement in Data-Warehouse-Systemen. Dissertation. Universität Oldenburg.Google ScholarGoogle Scholar
  40. J. Hipp, U. Güntzer, and U. Grimmer. 2001. Data quality mining-making a virtue of necessity. In 6th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DKMD’01). 52--57.Google ScholarGoogle Scholar
  41. J. Hipp, M. Müller, J. Hohendorff, and F. Naumann. 2007. Rule-based measurement of data quality in nominal data. In Proceedings of the 12th International Conference on Information Quality (ICIQ’07). 364--378.Google ScholarGoogle Scholar
  42. K. M. Hüner. 2011. Führungssysteme Und Ausgewählte Maßnahmen Zur Steuerung Von Konzerndatenqualität. Dissertation. Universität St. Gallen.Google ScholarGoogle Scholar
  43. K. M. Hüner, A. Schierning, B. Otto, and H. Österle. 2011. Product data quality in supply chains: The case of Beiersdorf. Electronic Markets 21, 2, 141--154.Google ScholarGoogle ScholarCross RefCross Ref
  44. IBM Big Data and Analytics Hub. 2016. Extracting Business Value from the 4 V's of Big Data. Retrieved July 19, 2017, from http://www.ibmbigdatahub.com/infographic/extracting-business-value-4-vs-big-data.Google ScholarGoogle Scholar
  45. IBM Global Business Services. 2012. Analytics: Big Data in der Praxis. IBM Global Business Services, Armonk.Google ScholarGoogle Scholar
  46. ISO/IEC 25020. 2007. Software Engineering - Software Product Quality Requirements and Evaluation (SQuaRE) - Measurement Reference Model and Guide 35.080.Google ScholarGoogle Scholar
  47. Z. Jiang, S. Sarkar, P. De, and D. Dey. 2007. A framework for reconciling attribute values from multiple data sources. Management Science 53, 12, 1946--1963. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. B. D. Jones. 1999. Bounded rationality. Annual Review of Political Science 2, 1, 297--321.Google ScholarGoogle ScholarCross RefCross Ref
  49. V. Khatri and C. V. Brown. 2010. Designing data governance. Communications of the ACM 53, 1, 148--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. KPMG. 2016. Now or Never - 2016 Global CEO Outlook. Retrieved July 31, 2017, from https://home.kpmg.com/content/dam/kpmg/pdf/2016/06/2016-global-ceo-outlook.pdf.Google ScholarGoogle Scholar
  51. H. Laux. 2007. Decision Theory. Springer Gabler, Wiesbaden (in German).Google ScholarGoogle Scholar
  52. Y. W. Lee, D. M. Strong, B. K. Kahn, and R. Y. Wang. 2002. AIMQ: A methodology for information quality assessment. Information and Management 40, 2, 133--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Y. Levy and T. J. Ellis. 2006. A systems approach to conduct an effective literature review in support of information systems research. Informing Science 9, 1, 181--212.Google ScholarGoogle ScholarCross RefCross Ref
  54. F. Li, S. Nastic, and S. Dustdar. 2012. Data quality observation in pervasive environments. In Proceedings of the 2012 IEEE 15th International Conference on Computational Science and Engineering (CSE’12). 602--609. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. M. S. Litwin, Ed. 1995. How to Measure Survey Reliability and Validity. The Survey Kit 7. Sage, Thousand Oaks, CA.Google ScholarGoogle Scholar
  56. D. Loshin. 2010. The Practitioner's Guide to Data Quality Improvement. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. T. Lukoianova and V. L. Rubin. 2014. Veracity roadmap: Is big data objective, truthful and credible? Advances in Classification Research Online 24, 1, 4--15.Google ScholarGoogle ScholarCross RefCross Ref
  58. P. V. Marsden and J. D. Wright (Eds.). 2010. Handbook of Survey Research. Emerald, Bingley.Google ScholarGoogle Scholar
  59. S. Moore. 2017. How to Create a Business Case for Data Quality Improvement. Retrieved July 19, 2017, from http://www.gartner.com/smarterwithgartner/how-to-create-a-business-case-for-data-quality-improvement/.Google ScholarGoogle Scholar
  60. M. Mosley, M. Brackett, and S. Earley (Eds.). 2009. The DAMA Guide to the Data Management Body of Knowledge Enterprise Server Version. Technics Publications, Westfield. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. R. von. Nitzsch. 2006. Entscheidungslehre. Verlag Mainz, Mainz.Google ScholarGoogle Scholar
  62. K. Orr. 1998. Data quality and systems theory. Communications of the ACM 41, 2, 66--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. B. Otto. 2011. Data governance. Business 8 Information Systems Engineering 3, 4, 241--244.Google ScholarGoogle Scholar
  64. A. Parssian, S. Sarkar, and V. S. Jacob. 2004. Assessing data quality for information products: impact of selection, projection, and Cartesian product. Management Science 50, 7, 967--982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. M. Peterson. 2009. An Introduction to Decision Theory. Cambridge University Press, Cambridge.Google ScholarGoogle Scholar
  66. L. L. Pipino, Y. W. Lee, and R. Y. Wang. 2002. Data quality assessment. Communications of the ACM 45, 4, 211--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. T. C. Redman. 1996. Data Quality for the Information Age. Artech House, Boston. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. S. Sarsfield. 2009. The Data Governance Imperative. IT Governance Publishing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. SAS Institute. 2013. 2013 Big Data Survey Research Brief. SAS Institute, Cary, NC.Google ScholarGoogle Scholar
  70. H. A. Simon. 1956. Rational choice and the structure of the environment. Psychological Review 63, 2, 129--138.Google ScholarGoogle ScholarCross RefCross Ref
  71. H. A. Simon. 1969. The Sciences of the Artificial. MIT Press, Cambridge.Google ScholarGoogle Scholar
  72. S. S. Stevens. 1946. On the theory of scales of measurement. Science 103, 2684, 677--680.Google ScholarGoogle Scholar
  73. I. Taleb, H. T. El Kassabi, M. A. Serhani, R. Dssouli, and C. Bouhaddioui. 2016. Big data quality: A quality dimensions evaluation. In 2016 International IEEE Conferences on Ubiquitous Intelligence 8 Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld’16). 759--765.Google ScholarGoogle Scholar
  74. R. Y. Wang. 1998. A product perspective on total data quality management. Communications of the ACM 41, 2, 58--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. R. Y. Wang, V. C. Storey, and C. P. Firth. 1995. A framework for analysis of data quality research. IEEE Transactions on Knowledge and Data Engineering 7, 4, 623--640. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. K. Weber, B. Otto, and H. Österle. 2009. One size does not fit all‐-a contingency approach to data governance. Journal of Data and Information Quality (JDIQ) 1, 1, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. J. Webster and R. T. Watson. 2002. Analyzing the past to prepare for the future: Writing a literature review. Management Information Systems Quarterly 26, 2, 13--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. A. Wechsler and A. Even. 2012. Using a Markov-chain model for assessing accuracy degradation and developing data maintenance policies. In Americas Conference on Information Systems (AMCIS’12).Google ScholarGoogle Scholar
  79. L. Yang, D. Neagu, M. T. D. Cronin, M. Hewitt, S. J. Enoch, J. C. Madden, and K. Przybylak. 2013. Towards a fuzzy expert system on toxicological data quality assessment. Molecular Informatics 32, 1, 65--78.Google ScholarGoogle ScholarCross RefCross Ref
  80. W. Zikmund, B. Babin, J. Carr, and M. Griffin. 2012. Business Research Methods. Cengage Learning, Mason.Google ScholarGoogle Scholar

Index Terms

  1. Requirements for Data Quality Metrics

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Journal of Data and Information Quality
      Journal of Data and Information Quality  Volume 9, Issue 2
      Challenge Paper, Experience Paper and Research Paper
      June 2017
      77 pages
      ISSN:1936-1955
      EISSN:1936-1963
      DOI:10.1145/3155015
      Issue’s Table of Contents

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 January 2018
      • Accepted: 1 September 2017
      • Revised: 1 August 2017
      • Received: 1 July 2016
      Published in jdiq Volume 9, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader