skip to main content
Skip header Section
Data Quality: The Accuracy DimensionJanuary 2003
Publisher:
  • Morgan Kaufmann Publishers Inc.
  • 340 Pine Street, Sixth Floor
  • San Francisco
  • CA
  • United States
ISBN:978-1-55860-891-7
Published:09 January 2003
Pages:
300
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

Data Quality: The Accuracy Dimension is about assessing the quality of corporate data and improving its accuracy using the data profiling method. Corporate data is increasingly important as companies continue to find new ways to use it. Likewise, improving the accuracy of data in information systems is fast becoming a major goal as companies realize how much it affects their bottom line. Data profiling is a new technology that supports and enhances the accuracy of databases throughout major IT shops. Jack Olson explains data profiling and shows how it fits into the larger picture of data quality. * Provides an accessible, enjoyable introduction to the subject of data accuracy, peppered with real-world anecdotes. * Provides a framework for data profiling with a discussion of analytical tools appropriate for assessing data accuracy. * Is written by one of the original developers of data profiling technology. * Is a must-read for any data management staff, IT management staff, and CIOs of companies with data assets.

Cited By

  1. ACM
    Francisco M, Alves-Souza S, Campos E and De Souza L Total Data Quality Management and Total Information Quality Management Applied to Costumer Relationship Management Proceedings of the 9th International Conference on Information Management and Engineering, (40-45)
  2. ACM
    Song S, Zhu H and Wang J Constraint-Variance Tolerant Data Repairing Proceedings of the 2016 International Conference on Management of Data, (877-892)
  3. ACM
    Xu H (2015). What Are the Most Important Factors for Accounting Information Quality and Their Impact on AIS Data Quality Outcomes?, Journal of Data and Information Quality, 5:4, (1-22), Online publication date: 3-Mar-2015.
  4. Liu S, Zhao Q and Wu X (2014). Feature selection based on partition clustering, International Journal of Knowledge-based and Intelligent Engineering Systems, 18:2, (135-142), Online publication date: 1-Apr-2014.
  5. Alpar P and Winkelsträter S (2014). Assessment of data quality in accounting data with association rules, Expert Systems with Applications: An International Journal, 41:5, (2259-2268), Online publication date: 1-Apr-2014.
  6. ACM
    Pavlov I A QoX model for ETL subsystems Proceedings of the 14th International Conference on Computer Systems and Technologies, (15-21)
  7. ACM
    Lóscio B, Batista M, Souza D and Salgado A Using information quality for the identification of relevant web data sources Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services, (36-44)
  8. ACM
    Collins C and Janssens K (2012). Creating a General (Family) Practice Epidemiological Database in Ireland - Data Quality Issue Management, Journal of Data and Information Quality, 4:1, (1-9), Online publication date: 1-Oct-2012.
  9. Fürber C and Hepp M Using semantic web resources for data quality management Proceedings of the 17th international conference on Knowledge engineering and management by the masses, (211-225)
  10. ACM
    Khatri V and Brown C (2010). Designing data governance, Communications of the ACM, 53:1, (148-152), Online publication date: 1-Jan-2010.
  11. ACM
    Fisher C, Lauria E and Matheus C (2009). An Accuracy Metric, Journal of Data and Information Quality, 1:3, (1-21), Online publication date: 1-Dec-2009.
  12. ACM
    Rodic J and Baranovic M Generating data quality rules and integration into ETL process Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP, (65-72)
  13. ACM
    Hüner K, Ofner M and Otto B Towards a maturity model for corporate data quality management Proceedings of the 2009 ACM symposium on Applied Computing, (231-238)
  14. ACM
    Jovanovic V and Cupic L Teaching agile validation of data models Proceedings of the 9th ACM SIGITE conference on Information technology education, (139-146)
  15. van Hooland S, Bontemps Y and Kaufman S Answering the call for more accountability Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications, (93-103)
  16. Farinha J and Trigueiros M An extensible metadata framework for data quality assessment of composite structures Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery, (34-44)
  17. Cappiello C, Comuzzi M and Plebani P On automated generation of web service level agreements Proceedings of the 19th international conference on Advanced information systems engineering, (264-278)
  18. Gomes P, Farinha J and Trigueiros M A data quality metamodel extension to CWM Proceedings of the fourth Asia-Pacific conference on Comceptual modelling - Volume 67, (17-26)
  19. Ardagna D, Cappiello C, Francalanci C and Groppi A Brokering multisource data with quality constraints Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I, (807-817)
  20. ACM
    Chen Z and Narasayya V Efficient computation of multiple group by queries Proceedings of the 2005 ACM SIGMOD international conference on Management of data, (263-274)
  21. ACM
    Leser U and Freytag J Mining for patterns in contradictory data Proceedings of the 2004 international workshop on Information quality in information systems, (51-58)
Contributors

Recommendations

Michael Vassilakopoulos

The quality of data affects all aspects of corporate information management, from data processing to decision-making support. Data accuracy, the focus of this book, is the most important element of data quality. Identifying and handling data quality and accuracy problems gets increasing attention from corporations, as they realize the significant financial implications of these problems. In this book, the author, who is an expert on information systems development, presents his experience on these issues in a systematic way. The book consists of three parts. The first part, comprised of three chapters, introduces the concepts of data quality and accuracy and their impact on information systems, surveys the data quality assurance technology, analyzes the aspects of data quality and accuracy, describes techniques for finding inaccurate values, and analyzes the different sources of inaccuracies. Part 2, also comprised of three chapters, focuses on the data quality assurance process. The first chapter of this part presents the structure and methods of a data quality assurance program. This chapter introduces the fundamental distinction between the inside-out method (that starts from the analysis of data for discovering inaccuracies) and the outside-in method (that starts by seeking for negative impacts at the business level, possibly related to data quality problems). The next chapter focuses on the management of the inaccuracy issues arising from the quality facts discovered, while the last chapter of this part presents the aspects related to the business case of a quality assurance program. The last and largest part of the book is more technical. It consists of seven chapters, and presents the data profiling technology, the core technology of the inside-out approach that has emerged over the last few years. The first chapter is an overview of this technology, and the next six chapters present its steps in more detail. As the author notes, "This part is not an exhaustive treatment of the topic of data profiling," however, it provides a clear view of this technology. It would be more helpful to the interested practitioner if citations to more technical texts appeared in this part. The last chapter summarizes the contribution of the book by commenting on a number of messages that the book aims to disseminate to the audience. Two appendices follow that demonstrate the principles of the data profiling technology. This book is very well structured and written, with a gradual presentation of the various aspects of data quality and accuracy. Although it is biased toward the conceptual view (rather than the technical view), it is one of the most practical books on data profiling I've seen. It can serve as a guide to the data quality and accuracy issues that will receive attention in the years to come. This book is valuable for multiple audiences, depending on their needs. The first part could serve as introductory material on data quality and accuracy for a wide audience interested in information technology. Executives and managers would be more interested in the first and second parts, while system designers and developers, database administrators, computer science students, and especially data quality practitioners can benefit by studying the whole book (especially Part 3). Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.