research-article

Ensuring High-Quality Private Data for Responsible Data Science: Vision and Challenges

Authors:
Divesh Srivastava

AT8T Labs Research, Bedminster, NJ, USA

AT8T Labs Research, Bedminster, NJ, USA
View Profile

,
Monica Scannapieco

Italian National Institute of Statistics, Roma, Italy

Italian National Institute of Statistics, Roma, Italy
View Profile

,
Thomas C. Redman

Data Quality Solutions, Rumson, NJ, USA

Data Quality Solutions, Rumson, NJ, USA
View Profile

Authors Info & Claims

Journal of Data and Information Quality Volume 11 Issue 1Article No.: 1pp 1–9https://doi.org/10.1145/3287168

Published:04 January 2019Publication History

Journal of Data and Information Quality

Abstract

High-quality data is critical for effective data science. As the use of data science has grown, so too have concerns that individuals’ rights to privacy will be violated. This has led to the development of data protection regulations around the globe and the use of sophisticated anonymization techniques to protect privacy. Such measures make it more challenging for the data scientist to understand the data, exacerbating issues of data quality. Responsible data science aims to develop useful insights from the data while fully embracing these considerations.

We pose the high-level problem in this article, “How can a data scientist develop the needed trust that private data has high quality?” We then identify a series of challenges for various data-centric communities and outline research questions for data quality and privacy researchers, which would need to be addressed to effectively answer the problem posed in this article.

References

C. Batini and M. Scannapieco. 2016. Data and Information Quality—Dimensions, Principles and Techniques. Springer International Publishing. Google ScholarDigital Library
L. English. 1999. Improving Data Warehouse and Business Information Quality. Wiley. Google ScholarDigital Library
S. Lohr. 2018. Facial Recognition is Accurate—If You’re a White Guy. Retrieved from https://www.nytimes.com/2018/02/09/technology/facial-recognition-race-artificial-intelligence.html.Google Scholar
D. McGilvray. 2008. Executing Data Quality Projects. Morgan Kaufmann. Google ScholarDigital Library
T. Nagle, T. Redman, and D. Sammon. 2017. Only 3% of Companies’ Data Meets Basic Quality Standards. Retrieved from https://hbr.org/2017/09/only-3-of-companies-data-meets-basic-quality-standards.Google Scholar
European Statistical System Project. 2018. ESSnet Big Data Pilots-I. Retrieved from https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/index.php/Main_Page.Google Scholar
T. Redman. 2016. Getting in Front on Data: Who Does What. Technics.Google Scholar
T. Redman. 2018. If Your Data Is Bad, Your Machine Learning Tools Are Useless. Retrieved from https://hbr.org/2018/04/if-your-data-is-bad-your-machine-learning-tools-are-useless.Google Scholar
G. Stateva, O. Bosch, D. Windmeijer, J. Maslankowski, G. Barcaroli, M. Scannapieco, D. Summa, M. Greenaway, I. Jansson, and D. Wu. 2018. Web Scraping Enterprise Characteristics-Final Report. Retrieved from https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/images/e/ee/Wp2_Del2_4.pdf.Google Scholar
E. Wilder-James. 2016. Breaking Down Data Silos. Retrieved from https://hbr.org/2016/12/breaking-down-data-silos.Google Scholar

Index Terms

Ensuring High-Quality Private Data for Responsible Data Science: Vision and Challenges

Recommendations

Provenance-based auditing of private data use
VoCS'08: Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference

Across the world, organizations are required to comply with regulatory frameworks dictating how to manage personal information. Despite these, several cases of data leaks and exposition of private data to unauthorized recipients have been publicly and ...
Read More
Protecting Privacy of Sensitive Data Dissemination Using Active Bundles
CONGRESS '09: Proceedings of the 2009 World Congress on Privacy, Security, Trust and the Management of e-Business

The solution for protecting data privacy proposed in this paper—, called Active Bundles—, protects sensitive data from their disclosure to unauthorized parties and from unauthorized dissemination (even if started by an authorized party). The Active ...
Read More
Responsible Data Science
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

Data science is an emerging discipline that offers both promise and peril. Responsible data science refers to efforts that address both the technical and societal issues in emerging data-driven technologies. How can machine learning and database systems ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Journal of Data and Information Quality Volume 11, Issue 1
On the Horizon, Regular Papers and Challenge Paper
March 2019
60 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/3303842
Editor:
Tiziana Catarci
Sapienza University of Rome, Rome, Italy
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 January 2019
- Received: 1 October 2018
- Accepted: 1 October 2018
Published in jdiq Volume 11, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Responsible data science
data trust
private data
quality of private data
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 939
  Total Downloads
- Downloads (Last 12 months)102
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Ensuring High-Quality Private Data for Responsible Data Science: Vision and Challenges

Journal of Data and Information Quality

Abstract

References

Cited By

Index Terms

Recommendations

Provenance-based auditing of private data use

Protecting Privacy of Sensitive Data Dissemination Using Active Bundles

Responsible Data Science