ABSTRACT
It is often highly valuable for organizations to have their data analyzed by external agents. However, any program that computes on potentially sensitive data risks leaking information through its output. Differential privacy provides a theoretical framework for processing data while protecting the privacy of individual records in a dataset. Unfortunately, it has seen limited adoption because of the loss in output accuracy, the difficulty in making programs differentially private, lack of mechanisms to describe the privacy budget in a programmer's utilitarian terms, and the challenging requirement that data owners and data analysts manually distribute the limited privacy budget between queries.
This paper presents the design and evaluation of a new system, GUPT, that overcomes these challenges. Unlike existing differentially private systems such as PINQ and Airavat, it guarantees differential privacy to programs not developed with privacy in mind, makes no trust assumptions about the analysis program, and is secure to all known classes of side-channel attacks.
GUPT uses a new model of data sensitivity that degrades privacy of data over time. This enables efficient allocation of different levels of privacy for different user applications while guaranteeing an overall constant level of privacy and maximizing the utility of each application. GUPT also introduces techniques that improve the accuracy of output while achieving the same level of privacy. These approaches enable GUPT to easily execute a wide variety of data analysis programs while providing both utility and privacy.
- N. Anciaux, L. Bouganim, H. H. van, P. Pucheral, and P. M. Apers. Data degradation: Making private data less sensitive over time. In CIKM, 2008. Google ScholarDigital Library
- F. Bancilhon and R. Ramakrishnan. An amateur's introduction to recursive query processing strategies. In SIGMOD, 1986. Google ScholarDigital Library
- M. Barbaro and T. Zeller. A face is exposed for aol searcher no. 4417749. The New York Times, Aug. 2006.Google Scholar
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Operating Systems Design and Implementation, October 2004. Google ScholarDigital Library
- C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006. Google ScholarDigital Library
- C. Dwork, M. Naor, T. Pitassi, and G. N. Rothblum. Differential privacy under continual observation. In STOC, 2010. Google ScholarDigital Library
- A. Frank and A. Asuncion. UCI machine learning repository, 2010.Google Scholar
- S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In KDD, 2008. Google ScholarDigital Library
- A. Ghosh and A. Roth. Selling privacy at auction. http://arxiv.org/abs/1011.1375. Google ScholarDigital Library
- A. Haeberlen, B. C. Pierce, and A. Narayan. Differential privacy under fire. In USENIX Security, 2011. Google ScholarDigital Library
- M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow., 3:1021--1032, September 2010. Google ScholarDigital Library
- V. Karwa, S. Raskhodnikova, A. Smith, and G. Yaroslavtsev. Private analysis of graph structure. In VLDB, 2011.Google ScholarDigital Library
- D. Kifer. Attacks on privacy and definetti's theorem. In SIGMOD, 2009. Google ScholarDigital Library
- C. Li, M. Hay, V. Rastogi, G. Miklau, and A. McGregor. Optimizing linear counting queries under differential privacy. In PODS, 2010. Google ScholarDigital Library
- A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In ICDE, 2006. Google ScholarDigital Library
- F. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD, 2009. Google ScholarDigital Library
- F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, 2007. Google ScholarDigital Library
- A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In IEEE Symposium on Security and Privacy, 2008. Google ScholarDigital Library
- K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In STOC, 2007. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999--66, Stanford InfoLab, 1999.Google Scholar
- V. Rastogi and S. Nath. Differentially private aggregation of distributed time-series with transformation and encryption. In SIGMOD, 2010. Google ScholarDigital Library
- I. Roy, S. T. V. Setty, A. Kilzer, V. Shmatikov, and E. Witchel. Airavat: security and privacy for mapreduce. In NSDI, 2010. Google ScholarDigital Library
- A. Serjantov and G. Danezis. Towards an information theoretic metric for anonymity. In PET, 2002. Google ScholarDigital Library
- A. Smith. Privacy-preserving statistical estimation with optimal convergence rates. In STOC, 2011. Google ScholarDigital Library
- L. Sweeney. k-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002. Google ScholarDigital Library
- H. H. van, M. Fokkinga, and N. Anciaux. A framework to balance privacy and data usability using data degradation. In CSE, 2009.Google Scholar
- C. Wright, C. Cowan, S. Smalley, J. Morris, and G. Kroah-Hartman. Linux security modules: General security support for the linux kernel. In USENIX Security, 2002. Google ScholarDigital Library
- X. Xiao, G. Bender, M. Hay, and J. Gehrke. ireduct: differential privacy withGoogle Scholar
Index Terms
- GUPT: privacy preserving data analysis made easy
Recommendations
Efficient privacy-aware search over encrypted databases
CODASPY '14: Proceedings of the 4th ACM conference on Data and application security and privacyIn recent years, database as a service (DAS) model where data management is outsourced to cloud service providers has become more prevalent. Although DAS model offers lower cost and flexibility, it necessitates the transfer of potentially sensitive data ...
Differentially private data release for data mining
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningPrivacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, ∈-differential privacy provides one of the strongest privacy guarantees and has no assumptions ...
IMR based Anonymization for Privacy Preservation in Data Mining
KMO '16: Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting SocietyPrivacy Preserving Data Mining (PPDM) is a data mining research area that aims to protect individual's personal information from unsolicited or unauthorized disclosure. Privacy relates to personal information that a person would not wish others to know ...
Comments