research-article

Composition attacks and auxiliary information in data privacy

Authors:
Srivatsava Ranjit Ganta

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

,
Shiva Prasad Kasiviswanathan

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

,
Adam Smith

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2008Pages 265–273https://doi.org/10.1145/1401890.1401926

Published:24 August 2008Publication History

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 265–273

ABSTRACT

Privacy is an increasingly important aspect of data publishing. Reasoning about privacy, however, is fraught with pitfalls. One of the most significant is the auxiliary information (also called external knowledge, background knowledge, or side information) that an adversary gleans from other channels such as the web, public records, or domain knowledge. This paper explores how one can reason about privacy in the face of rich, realistic sources of auxiliary information. Specifically, we investigate the effectiveness of current anonymization schemes in preserving privacy when multiple organizations independently release anonymized data about overlapping populations.

1. We investigate composition attacks, in which an adversary uses independent anonymized releases to breach privacy. We explain why recently proposed models of limited auxiliary information fail to capture composition attacks. Our experiments demonstrate that even a simple instance of a composition attack can breach privacy in practice, for a large class of currently proposed techniques. The class includes k-anonymity and several recent variants.

2. On a more positive note, certain randomization-based notions of privacy (such as differential privacy) provably resist composition attacks and, in fact, the use of arbitrary side information.This resistance enables "stand-alone" design of anonymization schemes, without the need for explicitly keeping track of other releases.

We provide a precise formulation of this property, and prove that an important class of relaxations of differential privacy also satisfy the property. This significantly enlarges the class of protocols known to enable modular design.

References

J. Official Statistics, special issue on disclosure limitation methods, Vol. 14, No. 4, 1998.Google Scholar
R. Agrawal and R. Srikant. Privacy-preserving data mining. In SIGMOD, pages 439--450. ACM Press, 2000. Google ScholarDigital Library
S. Agrawal and J. R. Haritsa. A framework for high-accuracy privacy-preserving mining. In ICDE, pages 193--204. IEEE Computer Society, 2005. Google ScholarDigital Library
B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In PODS, pages 273--282. ACM Press, 2007. Google ScholarDigital Library
M. Barbaro and T. Zeller. A face is exposed for AOL searcher no. 4417749. The New York Times, Aug. 2006.Google Scholar
A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: The SuLQ framework. In PODS, pages 128--138. ACM Press, 2005. Google ScholarDigital Library
A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, pages 609--618. ACM Press, 2008. Google ScholarDigital Library
J.-W. Byun, Y. Sohn, E. Bertino, and N. Li. Secure anonymization for incremental datasets. In Secure Data Management, pages 48--63. Springer, 2006. Google ScholarDigital Library
K. Chaudhuri and N. Mishra. When random sampling preserves privacy. In CRYPTO, pages 198--213. Springer, 2006. Google ScholarDigital Library
B.-C. Chen, R. Ramakrishnan, and K. LeFevre. Privacy skyline: Privacy with multidimensional adversarial knowledge. In VLDB, pages 770--781. VLDB Endowment, 2007. Google ScholarDigital Library
I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS, pages 202--210. ACM Press, 2003. Google ScholarDigital Library
J. Domingo-Ferrer and J. M. Mateo-Sanz. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering, 14(1):189--201, 2002. Google ScholarDigital Library
C. Dwork. Differential privacy. In ICALP, pages 1--12. Springer, 2006. Google ScholarDigital Library
C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486--503. Springer, 2006. Google ScholarDigital Library
C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486--503. Springer, 2006. Google ScholarDigital Library
C. Dwork and K. Nissim. Privacy-preserving datamining on vertically partitioned databases. In CRYPTO, pages 528--544. Springer, 2004.Google ScholarCross Ref
A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of association rules. In KDD, pages 217--228. ACM Press, 2002. Google ScholarDigital Library
A. V. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, pages 211--222. ACM Press, 2003. Google ScholarDigital Library
B. C. M. Fung, K. Wang, A. W.-C. Fu, and J. Pei. Anonymity for continuous data publishing. In EDBT, pages 264--275. ACM Press, 2008. Google ScholarDigital Library
S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. CoRR, arXiv:0803.0032v1 {cs.DB}, 2008.Google Scholar
S. Goldwasser and S. Micali. Probabilistic encryption. Journal of Computer and System Sciences, 28(2):270--299, 1984.Google ScholarCross Ref
S. P. Kasiviswanathan and A. Smith. A note on differential privacy: Defining resistance to arbitrary side information. CoRR, arXiv:0803.39461 {cs.CR}, 2008.Google Scholar
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, page 25. IEEE Computer Society, 2006. Google ScholarDigital Library
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization. In KDD, pages 277--286. ACM Press, 2006. Google ScholarDigital Library
N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, pages 106--115. IEEE Computer Society, 2007.Google ScholarCross Ref
Y. Lindell. Composition of Secure Multi-Party Protocols: A Comprehensive Study. Springer-Verlag, 2003. Google ScholarDigital Library
A. Machanavajjhala, D. Kifer, J. Abowd, J. Gehrke, and L. Vilhuber. Privacy: From theory to practice on the map. In ICDE, pages 277--286. IEEE Computer Society, 2008. Google ScholarDigital Library
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data, 1(1), 2007. Google ScholarDigital Library
D. J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Y. Halpern. Worst-case background knowledge for privacy-preserving data publishing. In ICDE, pages 126--135. IEEE Computer Society, 2007.Google ScholarCross Ref
F. McSherry and K. Talwar. Differential privacy in mechanism design. In FOCS, pages 94--103. IEEE Computer Society, 2007. Google ScholarDigital Library
K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In STOC, pages 75--84. ACM Press, 2007. Google ScholarDigital Library
J. Pei, J. Xu, Z. Wang, W. Wang, and K. Wang. Maintaining k-anonymity against incremental updates. In SSDBM, page 5, 2007. Google ScholarDigital Library
L. Sweeney. k-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5):557--570, 2002. Google ScholarDigital Library
UCI machine learning repository, http://www.ics.uci.edu/ mlearn/databases/.Google Scholar
A. van den Hout and P. van der Heijden. Randomized response, statistical disclosure control and misclassification: A review. International Statistical Review, 70:269--288, 2002.Google ScholarCross Ref
K. Wang and B. C. M. Fung. Anonymizing sequential releases. In KDD, pages 414--423. ACM Press, 2006. Google ScholarDigital Library
S. L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63--69, 1965.Google ScholarCross Ref
R. C.-W. Wong, A. W.-C. Fu, K. Wang, and J. Pei. Minimality attack in privacy preserving data publishing. In VLDB, pages 543--554. VLDB Endowment, 2007. Google ScholarDigital Library
X. Xiao and Y. Tao. Anatomy: Simple and effective privacy preservation. In VLDB, pages 139--150, 2006. Google ScholarDigital Library
X. Xiao and Y. Tao. Personalized Privacy Preservation In SIGMOD, pages 229--240, 2006. Google ScholarDigital Library
X. Xiao and Y. Tao. M-invariance: towards privacy preserving re-publication of dynamic datasets. In SIGMOD, pages 689--700. ACM Press, 2007. Google ScholarDigital Library
C. Yao, X. S. Wang, and S. Jajodia. Checking for k-anonymity violation by views. In VLDB, p. 910--921, 2005. Google ScholarDigital Library

Index Terms

Composition attacks and auxiliary information in data privacy
1. Information systems
  1. Data management systems

Recommendations

Cloning for privacy protection in multiple independent data publications
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Data anonymization has become a major technique in privacy preserving data publishing. Many methods have been proposed to anonymize one dataset and a series of datasets of a data owner. However, no method has been proposed for the anonymization of data ...
Read More
Preservation of proximity privacy in publishing numerical sensitive data
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

We identify proximity breach as a privacy threat specific to numerical sensitive attributes in anonymized data publication. Such breach occurs when an adversary concludes with high confidence that the sensitive value of a victim individual must fall in ...
Read More
A Survey on Privacy Preserving Dynamic Data Publishing

Many organizations, especially small and medium business SMB enterprises require the collection and sharing of data containing personal information. The privacy of this data must be preserved before outsourcing to the commercial public. Privacy ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2008
1116 pages
ISBN:9781605581934
DOI:10.1145/1401890
General Chair:
Ying Li
Microsoft adCenter Labs
,
Program Chairs:
Bing Liu
University of Illinois at Chicago
,
Sunita Sarawagi
Indian Institute of Technology, Bombay
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
adversarial attacks
anonymization
privacy
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '08 Paper Acceptance Rate118of593submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 234
  Total Citations
  View Citations
- 1,652
  Total Downloads
- Downloads (Last 12 months)71
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Composition attacks and auxiliary information in data privacy

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cloning for privacy protection in multiple independent data publications

Preservation of proximity privacy in publishing numerical sensitive data

A Survey on Privacy Preserving Dynamic Data Publishing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Composition attacks and auxiliary information in data privacy

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cloning for privacy protection in multiple independent data publications

Preservation of proximity privacy in publishing numerical sensitive data

A Survey on Privacy Preserving Dynamic Data Publishing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media