ABSTRACT
Privacy is an increasingly important aspect of data publishing. Reasoning about privacy, however, is fraught with pitfalls. One of the most significant is the auxiliary information (also called external knowledge, background knowledge, or side information) that an adversary gleans from other channels such as the web, public records, or domain knowledge. This paper explores how one can reason about privacy in the face of rich, realistic sources of auxiliary information. Specifically, we investigate the effectiveness of current anonymization schemes in preserving privacy when multiple organizations independently release anonymized data about overlapping populations.
1. We investigate composition attacks, in which an adversary uses independent anonymized releases to breach privacy. We explain why recently proposed models of limited auxiliary information fail to capture composition attacks. Our experiments demonstrate that even a simple instance of a composition attack can breach privacy in practice, for a large class of currently proposed techniques. The class includes k-anonymity and several recent variants.
2. On a more positive note, certain randomization-based notions of privacy (such as differential privacy) provably resist composition attacks and, in fact, the use of arbitrary side information.This resistance enables "stand-alone" design of anonymization schemes, without the need for explicitly keeping track of other releases.
We provide a precise formulation of this property, and prove that an important class of relaxations of differential privacy also satisfy the property. This significantly enlarges the class of protocols known to enable modular design.
- J. Official Statistics, special issue on disclosure limitation methods, Vol. 14, No. 4, 1998.Google Scholar
- R. Agrawal and R. Srikant. Privacy-preserving data mining. In SIGMOD, pages 439--450. ACM Press, 2000. Google ScholarDigital Library
- S. Agrawal and J. R. Haritsa. A framework for high-accuracy privacy-preserving mining. In ICDE, pages 193--204. IEEE Computer Society, 2005. Google ScholarDigital Library
- B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In PODS, pages 273--282. ACM Press, 2007. Google ScholarDigital Library
- M. Barbaro and T. Zeller. A face is exposed for AOL searcher no. 4417749. The New York Times, Aug. 2006.Google Scholar
- A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: The SuLQ framework. In PODS, pages 128--138. ACM Press, 2005. Google ScholarDigital Library
- A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, pages 609--618. ACM Press, 2008. Google ScholarDigital Library
- J.-W. Byun, Y. Sohn, E. Bertino, and N. Li. Secure anonymization for incremental datasets. In Secure Data Management, pages 48--63. Springer, 2006. Google ScholarDigital Library
- K. Chaudhuri and N. Mishra. When random sampling preserves privacy. In CRYPTO, pages 198--213. Springer, 2006. Google ScholarDigital Library
- B.-C. Chen, R. Ramakrishnan, and K. LeFevre. Privacy skyline: Privacy with multidimensional adversarial knowledge. In VLDB, pages 770--781. VLDB Endowment, 2007. Google ScholarDigital Library
- I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS, pages 202--210. ACM Press, 2003. Google ScholarDigital Library
- J. Domingo-Ferrer and J. M. Mateo-Sanz. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering, 14(1):189--201, 2002. Google ScholarDigital Library
- C. Dwork. Differential privacy. In ICALP, pages 1--12. Springer, 2006. Google ScholarDigital Library
- C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486--503. Springer, 2006. Google ScholarDigital Library
- C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486--503. Springer, 2006. Google ScholarDigital Library
- C. Dwork and K. Nissim. Privacy-preserving datamining on vertically partitioned databases. In CRYPTO, pages 528--544. Springer, 2004.Google ScholarCross Ref
- A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of association rules. In KDD, pages 217--228. ACM Press, 2002. Google ScholarDigital Library
- A. V. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, pages 211--222. ACM Press, 2003. Google ScholarDigital Library
- B. C. M. Fung, K. Wang, A. W.-C. Fu, and J. Pei. Anonymity for continuous data publishing. In EDBT, pages 264--275. ACM Press, 2008. Google ScholarDigital Library
- S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. CoRR, arXiv:0803.0032v1 {cs.DB}, 2008.Google Scholar
- S. Goldwasser and S. Micali. Probabilistic encryption. Journal of Computer and System Sciences, 28(2):270--299, 1984.Google ScholarCross Ref
- S. P. Kasiviswanathan and A. Smith. A note on differential privacy: Defining resistance to arbitrary side information. CoRR, arXiv:0803.39461 {cs.CR}, 2008.Google Scholar
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, page 25. IEEE Computer Society, 2006. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization. In KDD, pages 277--286. ACM Press, 2006. Google ScholarDigital Library
- N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, pages 106--115. IEEE Computer Society, 2007.Google ScholarCross Ref
- Y. Lindell. Composition of Secure Multi-Party Protocols: A Comprehensive Study. Springer-Verlag, 2003. Google ScholarDigital Library
- A. Machanavajjhala, D. Kifer, J. Abowd, J. Gehrke, and L. Vilhuber. Privacy: From theory to practice on the map. In ICDE, pages 277--286. IEEE Computer Society, 2008. Google ScholarDigital Library
- A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data, 1(1), 2007. Google ScholarDigital Library
- D. J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Y. Halpern. Worst-case background knowledge for privacy-preserving data publishing. In ICDE, pages 126--135. IEEE Computer Society, 2007.Google ScholarCross Ref
- F. McSherry and K. Talwar. Differential privacy in mechanism design. In FOCS, pages 94--103. IEEE Computer Society, 2007. Google ScholarDigital Library
- K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In STOC, pages 75--84. ACM Press, 2007. Google ScholarDigital Library
- J. Pei, J. Xu, Z. Wang, W. Wang, and K. Wang. Maintaining k-anonymity against incremental updates. In SSDBM, page 5, 2007. Google ScholarDigital Library
- L. Sweeney. k-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5):557--570, 2002. Google ScholarDigital Library
- UCI machine learning repository, http://www.ics.uci.edu/ mlearn/databases/.Google Scholar
- A. van den Hout and P. van der Heijden. Randomized response, statistical disclosure control and misclassification: A review. International Statistical Review, 70:269--288, 2002.Google ScholarCross Ref
- K. Wang and B. C. M. Fung. Anonymizing sequential releases. In KDD, pages 414--423. ACM Press, 2006. Google ScholarDigital Library
- S. L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63--69, 1965.Google ScholarCross Ref
- R. C.-W. Wong, A. W.-C. Fu, K. Wang, and J. Pei. Minimality attack in privacy preserving data publishing. In VLDB, pages 543--554. VLDB Endowment, 2007. Google ScholarDigital Library
- X. Xiao and Y. Tao. Anatomy: Simple and effective privacy preservation. In VLDB, pages 139--150, 2006. Google ScholarDigital Library
- X. Xiao and Y. Tao. Personalized Privacy Preservation In SIGMOD, pages 229--240, 2006. Google ScholarDigital Library
- X. Xiao and Y. Tao. M-invariance: towards privacy preserving re-publication of dynamic datasets. In SIGMOD, pages 689--700. ACM Press, 2007. Google ScholarDigital Library
- C. Yao, X. S. Wang, and S. Jajodia. Checking for k-anonymity violation by views. In VLDB, p. 910--921, 2005. Google ScholarDigital Library
Index Terms
- Composition attacks and auxiliary information in data privacy
Recommendations
Cloning for privacy protection in multiple independent data publications
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementData anonymization has become a major technique in privacy preserving data publishing. Many methods have been proposed to anonymize one dataset and a series of datasets of a data owner. However, no method has been proposed for the anonymization of data ...
Preservation of proximity privacy in publishing numerical sensitive data
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataWe identify proximity breach as a privacy threat specific to numerical sensitive attributes in anonymized data publication. Such breach occurs when an adversary concludes with high confidence that the sensitive value of a victim individual must fall in ...
A Survey on Privacy Preserving Dynamic Data Publishing
Many organizations, especially small and medium business SMB enterprises require the collection and sharing of data containing personal information. The privacy of this data must be preserved before outsourcing to the commercial public. Privacy ...
Comments