ABSTRACT
We consider private data analysis in the setting in which a trusted and trustworthy curator, having obtained a large data set containing private information, releases to the public a "sanitization" of the data set that simultaneously protects the privacy of the individual contributors of data and offers utility to the data analyst. The sanitization may be in the form of an arbitrary data structure, accompanied by a computational procedure for determining approximate answers to queries on the original data set, or it may be a "synthetic data set" consisting of data items drawn from the same universe as items in the original data set; queries are carried out as if the synthetic data set were the actual input. In either case the process is non-interactive; once the sanitization has been released the original data and the curator play no further role.
For the task of sanitizing with a synthetic dataset output, we map the boundary between computational feasibility and infeasibility with respect to a variety of utility measures. For the (potentially easier) task of sanitizing with unrestricted output format, we show a tight qualitative and quantitative connection between hardness of sanitizing and the existence of traitor tracing schemes.
- A. Blum, K. Ligett, and A. Roth. A learning theory approach to non--interactive database privacy. In STOC, pages 609--618, 2008. Google ScholarDigital Library
- D. Boneh and M. Naor. Traitor tracing with constant size ciphertext. In ACM Conference on Computer and Communications Security, pages 501--510, 2008. Google ScholarDigital Library
- D. Boneh, A. Sahai, and B. Waters. Fully collusion resistant traitor tracing with short ciphertexts and private keys. In EUROCRYPT, pages 573--592, 2006. Google ScholarDigital Library
- B. Chor, A. Fiat, and M. Naor. Tracing traitors. In CRYPTO, pages 257--270, 1994. Google ScholarDigital Library
- I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS, pages 202--210, 2003. Google ScholarDigital Library
- C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In S. Halevy and T. Rabin, editors, First Theory of Cryptography Conference (TCC), volume 3876, pages 265--284. Springer-Verlag, 2006. Google ScholarDigital Library
- C. Dwork, F. McSherry, and K. Talwar. The price of privacy and the limits of lp decoding. In STOC, pages 85--94, 2007. Google ScholarDigital Library
- C. Dwork and K. Nissim. Privacy-preserving datamining on vertically partitioned databases. In CRYPTO, pages 528--544, 2004.Google ScholarCross Ref
- C. Dwork and S. Yekhanin. New efficient attacks on statistical disclosure control mechanisms. In CRYPTO, pages 469--480, 2008. Google ScholarDigital Library
- D. Feldman, A. Fiat, H. Kaplan, and K. Nissim. Private coresets. These Proceedings, 2009. Google ScholarDigital Library
- O. Goldreich. The Foundations of Cryptography -- Volume 2. Cambridge University Press, 2004. Google ScholarDigital Library
- O. Goldreich, S. Goldwasser, and S. Micali. How to construct pseudorandom functions. Journal of the ACM, 33(2):792--807, 1986. Google ScholarDigital Library
- S. Goldwasser and S. Micali. Probabilistic encryption. Journal of Computer and System Sciences, 28(2):270--299, 1984.Google ScholarCross Ref
- R. Impagliazzo, R. Jaiswal, V. Kabanets, and A. Wigderson. Uniform direct product theorems: simplified, optimized, and derandomized. In STOC, pages 579--588, 2008. Google ScholarDigital Library
- P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? In FOCS, pages 1--19, 2008. Google ScholarDigital Library
- A. Kiayias and M. Yung. Self protecting pirates and black-box traitor tracing. In CRYPTO, pages 63--79, 2001. Google ScholarDigital Library
- F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, pages 94--103. IEEE Computer Society, 2007. Google ScholarDigital Library
- M. E. Saks and S. Zhou. Bp space(s) subseteq dspace(s3/2). J. Comput. Syst. Sci., 58(2):376--403, 1999. Google ScholarDigital Library
- R. E. Schapire. Theoretical views of boosting and applications. In ATL, pages 13--25, 1999. Google ScholarDigital Library
- L. G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134--1142, 1984. Google ScholarDigital Library
Index Terms
- On the complexity of differentially private data release: efficient algorithms and hardness results
Recommendations
A differentially private algorithm for location data release
The rise of mobile technologies in recent years has led to large volumes of location information, which are valuable resources for knowledge discovery such as travel patterns mining and traffic analysis. However, location dataset has been confronted ...
Incremental release of differentially-private check-in data
Due to the growing popularity of location-based services and geo-social networks, users communicate more and more private location traces to service providers, as well as explicit spatio-temporal data, often called "check-ins", about their presence in ...
Differentially private data publishing via optimal univariate microaggregation and record perturbation
AbstractWe present an approach to generate differentially private data sets that consists in adding noise to a microaggregated version of the original data set. While this idea has already been pursued in the literature to reduce the ...
Comments