Abstract
Data quality and especially the assessment of data quality have been intensively discussed in research and practice alike. To support an economically oriented management of data quality and decision making under uncertainty, it is essential to assess the data quality level by means of well-founded metrics. However, if not adequately defined, these metrics can lead to wrong decisions and economic losses. Therefore, based on a decision-oriented framework, we present a set of five requirements for data quality metrics. These requirements are relevant for a metric that aims to support an economically oriented management of data quality and decision making under uncertainty. We further demonstrate the applicability and efficacy of these requirements by evaluating five data quality metrics for different data quality dimensions. Moreover, we discuss practical implications when applying the presented requirements.
- R. Agrawal, T. Imieliński, and A. Swami. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD/PODS’93), P. Buneman and S. Jajodia (Eds.). ACM Press, New York, 207--216. Google ScholarDigital Library
- R. Agrawal and R. Srikant. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94), J. B. Bocca, M. Jarke, and C. Zaniolo (Eds.). Morgan Kaufmann Publishers, San Francisco, CA, 487--499. Google ScholarDigital Library
- M. Allen and D. Cervo. 2015. Multi-Domain Master Data Management. Advanced MDM and Data Governance in Practice. Morgan Kaufmann. Google ScholarDigital Library
- M. J. Allen and W. M. Yen. 2002. Introduction to Measurement Theory. Waveland Press, Long Grove, IL.Google Scholar
- P. Alpar and S. Winkelsträter. 2014. Assessment of data quality in accounting data with association rules. Expert Systems with Applications 41, 5, 2259--2268. Google ScholarDigital Library
- M. Azuma. 2001. SQuaRE: The next generation of the ISO/IEC 9126 and 14598 international standards series on software product quality. In European Software Control and Metrics Conference (ESCOM’01), 337--346.Google Scholar
- D. Ballou, R. Wang, H. Pazer, and G. K. Tayi. 1998. Modeling information manufacturing systems to determine information product quality. Management Science 44, 4, 462--484. Google ScholarDigital Library
- C. Batini and M. Scannapieco. 2006. Data quality: Concepts, Methodologies and Techniques. Springer, New York. Google ScholarDigital Library
- C. Batini and M. Scannapieco. 2016. Data quality dimensions. In Data and Information Quality. Springer, 21--51.Google Scholar
- R. Blake and P. Mangiameli. 2011. The effects and interactions of data quality and problem complexity on classification. Journal of Data and Information Quality (JDIQ) 2, 2, 8. Google ScholarDigital Library
- L. C. Briand, S. Morasca, and V. R. Basili. 1996. Property-based software engineering measurement. IEEE Transactions on Software Engineering 22, 1, 68--86. Google ScholarDigital Library
- H. U. Buhl, M. Röglinger, F. Moser, and J. Heidemann. 2013. Big data. A fashionable topic with(out) sustainable relevance for research and practice? Business 8 Information Systems Engineering 5, 2, 65--69.Google Scholar
- Bureau International des Poids et Mesures. 2006. The International System of Units (SI). National Institute of Standards and Technology, Paris.Google Scholar
- L. Cai and Y. Zhu. 2015. The challenges of data quality and data quality assessment in the big data era. Data Science Journal 14, 2 (2015), 1--10.Google ScholarCross Ref
- Y. Cai and M. Ziad. 2003. Evaluating completeness of an information product. In Americas Conference on Information Systems (AMCIS’03). 2273--2281.Google Scholar
- J. Campanella. 1999. Principles of Quality Costs: Principles, Implementation and Use. ASQ Quality Press, Milwaukee.Google Scholar
- C. Cappiello and M. Comuzzi. 2009. A utility-based model to define the optimal data quality level in IT service offerings. In European Conference on Information Systems (ECIS’09).Google Scholar
- C. Cappiello, T. Di Noia, B. A. Marcu, and M. Matera. 2016. A quality model for linked data exploration. In International Conference on Web Engineering (ICWE’16). 397--404.Google Scholar
- P. Cozby and S. Bates. 2012. Methods in Behavioral Research. McGraw-Hill Higher Education, New York.Google Scholar
- J. Debattista, S. Auer, and C. Lange. 2016. Luzzu—a methodology and framework for linked data quality assessment. Journal of Data and Information Quality (JDIQ) 8, 1, 4. Google ScholarDigital Library
- D. Driankov, H. Hellendoorn, and M. Reinfrank. 1996. An Introduction to Fuzzy Control. Springer, Berlin. Google ScholarDigital Library
- M. J. Eppler. 2003. Managing Information Quality: Increasing the Value of Information in Knowledge-Intensive Products and Processes. Springer, Berlin. Google ScholarDigital Library
- A. Even and G. Shankaranarayanan. 2007. Utility-driven assessment of data quality. Database for Advances in Information Systems 38, 2, 75--93. Google ScholarDigital Library
- A. Even, G. Shankaranarayanan, and P. D. Berger. 2010. Evaluating a model for cost-effective data quality management in a real-world CRM setting. Decision Support Systems 50, 1, 152--163. Google ScholarDigital Library
- Experian Information Solutions. 2016. Building a Business Case for Data Quality. Retrieved July 19, 2017, from https://www.edq.com/globalassets/white-papers/building-a-business-case-for-data-quality-report.pdf.Google Scholar
- W. Fan. 2015. Data quality. from theory to practice. SIGMOD Record 44, 3, 7--18. Google ScholarDigital Library
- A. V. Feigenbaum. 2004. Total Quality Control. McGraw-Hill Professional New York.Google Scholar
- C. W. Fisher, I. Chengalur-Smith, and D. P. Ballou. 2003. The impact of experience and time on the use of data quality information in decision making. Information Systems Research 14, 2, 170--188. Google ScholarDigital Library
- C. W. Fisher, E. J. M. Lauria, and C. C. Matheus. 2009. An accuracy metric: Percentages, randomness, and probabilities. Journal of Data and Information Quality (JDIQ) 1, 3, 16. Google ScholarDigital Library
- M. Flood, H. V. Jagadish, and L. Raschid. 2016. Big data challenges and opportunities in financial stability monitoring. Banque de France, Financial Stability Review 20.Google Scholar
- Forbes Insights. 2017. The Data Differentiator. How Improving Data Quality Improves Business. Forbes Media, New York.Google Scholar
- B. Heinrich and D. Hristova. 2014. A fuzzy metric for currency in the context of big data. In European Conference on Information Systems (ECIS’04).Google Scholar
- B. Heinrich and D. Hristova. 2016. A quantitative approach for modelling the influence of currency of information on decision-making under uncertainty. Journal of Decision Systems 25, 1, 16--41.Google ScholarCross Ref
- B. Heinrich, M. Kaiser, and M. Klier. 2007. How to measure data quality? A metric-based approach. In International Conference on Information Systems (ICIS’07).Google Scholar
- B. Heinrich and M. Klier. 2011. Assessing data currency-a probabilistic approach. Journal of Information Science 37, 1, 86--100. Google ScholarDigital Library
- B. Heinrich and M. Klier. 2015. Metric-based data quality assessment—Developing and evaluating a probability-based currency metric. Decision Support Systems 72, 82--96. Google ScholarDigital Library
- B. Heinrich, M. Klier, and Q. Görz. 2012. Data quality assessment: a metric-based approach to quantify the currency of data in information systems. Z Betriebswirtsch 82, 11, 1193--1228 (in German).Google ScholarCross Ref
- B. Heinrich, M. Klier, and M. Kaiser. 2009. A procedure to develop metrics for currency and its application in CRM. Journal of Data and Information Quality (JDIQ) 1, 1, 1. Google ScholarDigital Library
- H. Hinrichs. 2002. Datenqualitätsmanagement in Data-Warehouse-Systemen. Dissertation. Universität Oldenburg.Google Scholar
- J. Hipp, U. Güntzer, and U. Grimmer. 2001. Data quality mining-making a virtue of necessity. In 6th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DKMD’01). 52--57.Google Scholar
- J. Hipp, M. Müller, J. Hohendorff, and F. Naumann. 2007. Rule-based measurement of data quality in nominal data. In Proceedings of the 12th International Conference on Information Quality (ICIQ’07). 364--378.Google Scholar
- K. M. Hüner. 2011. Führungssysteme Und Ausgewählte Maßnahmen Zur Steuerung Von Konzerndatenqualität. Dissertation. Universität St. Gallen.Google Scholar
- K. M. Hüner, A. Schierning, B. Otto, and H. Österle. 2011. Product data quality in supply chains: The case of Beiersdorf. Electronic Markets 21, 2, 141--154.Google ScholarCross Ref
- IBM Big Data and Analytics Hub. 2016. Extracting Business Value from the 4 V's of Big Data. Retrieved July 19, 2017, from http://www.ibmbigdatahub.com/infographic/extracting-business-value-4-vs-big-data.Google Scholar
- IBM Global Business Services. 2012. Analytics: Big Data in der Praxis. IBM Global Business Services, Armonk.Google Scholar
- ISO/IEC 25020. 2007. Software Engineering - Software Product Quality Requirements and Evaluation (SQuaRE) - Measurement Reference Model and Guide 35.080.Google Scholar
- Z. Jiang, S. Sarkar, P. De, and D. Dey. 2007. A framework for reconciling attribute values from multiple data sources. Management Science 53, 12, 1946--1963. Google ScholarDigital Library
- B. D. Jones. 1999. Bounded rationality. Annual Review of Political Science 2, 1, 297--321.Google ScholarCross Ref
- V. Khatri and C. V. Brown. 2010. Designing data governance. Communications of the ACM 53, 1, 148--152. Google ScholarDigital Library
- KPMG. 2016. Now or Never - 2016 Global CEO Outlook. Retrieved July 31, 2017, from https://home.kpmg.com/content/dam/kpmg/pdf/2016/06/2016-global-ceo-outlook.pdf.Google Scholar
- H. Laux. 2007. Decision Theory. Springer Gabler, Wiesbaden (in German).Google Scholar
- Y. W. Lee, D. M. Strong, B. K. Kahn, and R. Y. Wang. 2002. AIMQ: A methodology for information quality assessment. Information and Management 40, 2, 133--146. Google ScholarDigital Library
- Y. Levy and T. J. Ellis. 2006. A systems approach to conduct an effective literature review in support of information systems research. Informing Science 9, 1, 181--212.Google ScholarCross Ref
- F. Li, S. Nastic, and S. Dustdar. 2012. Data quality observation in pervasive environments. In Proceedings of the 2012 IEEE 15th International Conference on Computational Science and Engineering (CSE’12). 602--609. Google ScholarDigital Library
- M. S. Litwin, Ed. 1995. How to Measure Survey Reliability and Validity. The Survey Kit 7. Sage, Thousand Oaks, CA.Google Scholar
- D. Loshin. 2010. The Practitioner's Guide to Data Quality Improvement. Morgan Kaufmann. Google ScholarDigital Library
- T. Lukoianova and V. L. Rubin. 2014. Veracity roadmap: Is big data objective, truthful and credible? Advances in Classification Research Online 24, 1, 4--15.Google ScholarCross Ref
- P. V. Marsden and J. D. Wright (Eds.). 2010. Handbook of Survey Research. Emerald, Bingley.Google Scholar
- S. Moore. 2017. How to Create a Business Case for Data Quality Improvement. Retrieved July 19, 2017, from http://www.gartner.com/smarterwithgartner/how-to-create-a-business-case-for-data-quality-improvement/.Google Scholar
- M. Mosley, M. Brackett, and S. Earley (Eds.). 2009. The DAMA Guide to the Data Management Body of Knowledge Enterprise Server Version. Technics Publications, Westfield. Google ScholarDigital Library
- R. von. Nitzsch. 2006. Entscheidungslehre. Verlag Mainz, Mainz.Google Scholar
- K. Orr. 1998. Data quality and systems theory. Communications of the ACM 41, 2, 66--71. Google ScholarDigital Library
- B. Otto. 2011. Data governance. Business 8 Information Systems Engineering 3, 4, 241--244.Google Scholar
- A. Parssian, S. Sarkar, and V. S. Jacob. 2004. Assessing data quality for information products: impact of selection, projection, and Cartesian product. Management Science 50, 7, 967--982. Google ScholarDigital Library
- M. Peterson. 2009. An Introduction to Decision Theory. Cambridge University Press, Cambridge.Google Scholar
- L. L. Pipino, Y. W. Lee, and R. Y. Wang. 2002. Data quality assessment. Communications of the ACM 45, 4, 211--218. Google ScholarDigital Library
- T. C. Redman. 1996. Data Quality for the Information Age. Artech House, Boston. Google ScholarDigital Library
- S. Sarsfield. 2009. The Data Governance Imperative. IT Governance Publishing. Google ScholarDigital Library
- SAS Institute. 2013. 2013 Big Data Survey Research Brief. SAS Institute, Cary, NC.Google Scholar
- H. A. Simon. 1956. Rational choice and the structure of the environment. Psychological Review 63, 2, 129--138.Google ScholarCross Ref
- H. A. Simon. 1969. The Sciences of the Artificial. MIT Press, Cambridge.Google Scholar
- S. S. Stevens. 1946. On the theory of scales of measurement. Science 103, 2684, 677--680.Google Scholar
- I. Taleb, H. T. El Kassabi, M. A. Serhani, R. Dssouli, and C. Bouhaddioui. 2016. Big data quality: A quality dimensions evaluation. In 2016 International IEEE Conferences on Ubiquitous Intelligence 8 Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld’16). 759--765.Google Scholar
- R. Y. Wang. 1998. A product perspective on total data quality management. Communications of the ACM 41, 2, 58--65. Google ScholarDigital Library
- R. Y. Wang, V. C. Storey, and C. P. Firth. 1995. A framework for analysis of data quality research. IEEE Transactions on Knowledge and Data Engineering 7, 4, 623--640. Google ScholarDigital Library
- K. Weber, B. Otto, and H. Österle. 2009. One size does not fit all‐-a contingency approach to data governance. Journal of Data and Information Quality (JDIQ) 1, 1, 4. Google ScholarDigital Library
- J. Webster and R. T. Watson. 2002. Analyzing the past to prepare for the future: Writing a literature review. Management Information Systems Quarterly 26, 2, 13--23. Google ScholarDigital Library
- A. Wechsler and A. Even. 2012. Using a Markov-chain model for assessing accuracy degradation and developing data maintenance policies. In Americas Conference on Information Systems (AMCIS’12).Google Scholar
- L. Yang, D. Neagu, M. T. D. Cronin, M. Hewitt, S. J. Enoch, J. C. Madden, and K. Przybylak. 2013. Towards a fuzzy expert system on toxicological data quality assessment. Molecular Informatics 32, 1, 65--78.Google ScholarCross Ref
- W. Zikmund, B. Babin, J. Carr, and M. Griffin. 2012. Business Research Methods. Cengage Learning, Mason.Google Scholar
Index Terms
- Requirements for Data Quality Metrics
Recommendations
Assessing data currency - a probabilistic approach
The growing relevance of data quality has revealed the need for adequate measurement. As time aspects are extremely important in data quality management, we propose a novel approach to assess data currency. Our metric, which is founded on probability ...
Metric-based data quality assessment - Developing and evaluating a probability-based currency metric
Data quality assessment has been discussed intensively in the literature and is critical in business. The importance of using up-to-date data in business, innovation, and decision-making processes has revealed the need for adequate metrics to assess the ...
Data quality assessment: The Hybrid Approach
Various techniques have been proposed to enable organisations to assess the current quality level of their data. Unfortunately, organisations have many different requirements related to data quality (DQ) assessment. For example, some organisations may ...
Comments