research-article

Fast data anonymization with low information loss

Authors:
Gabriel Ghinita

National University of Singapor

National University of Singapor
View Profile

,
Panagiotis Karras

University of Hong Kong

University of Hong Kong
View Profile

,
Panos Kalnis

National University of Singapor

National University of Singapor
View Profile

,
Nikos Mamoulis

University of Hong Kong

University of Hong Kong
View Profile

Authors Info & Claims

VLDB '07: Proceedings of the 33rd international conference on Very large data basesSeptember 2007Pages 758–769

Published:23 September 2007Publication History

VLDB '07: Proceedings of the 33rd international conference on Very large data bases

Pages 758–769

ABSTRACT

Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the privacy preserving paradigms of k-anonymity and l-diversity. k-anonymity protects against the identification of an individual's record. l-diversity, in addition, safeguards against the association of an individual with specific sensitive information. However, existing approaches suffer from at least one of the following drawbacks: (i) The information loss metrics are counter-intuitive and fail to capture data inaccuracies inflicted for the sake of privacy. (ii) l-diversity is solved by techniques developed for the simpler k-anonymity problem, which introduces unnecessary inaccuracies. (iii) The anonymization process is inefficient in terms of computation and I/O cost.

In this paper we propose a framework for efficient privacy preservation that addresses these deficiencies. First, we focus on one-dimensional (i.e., single attribute) quasi-identifiers, and study the properties of optimal solutions for k-anonymity and l-diversity, based on meaningful information loss metrics. Guided by these properties, we develop efficient heuristics to solve the one-dimensional problems in linear time. Finally, we generalize our solutions to multi-dimensional quasi-identifiers using space-mapping techniques. Extensive experimental evaluation shows that our techniques clearly outperform the state-of-the-art, in terms of execution time and information loss.

References

G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu. Achieving Anonymity via Clustering. In Proc. of ACM PODS, pp 153--162, 2006. Google ScholarDigital Library
G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Approximation Algorithms for k-Anonymity. Journal of Privacy Technology, (Paper number: 20051120001), 2005.Google Scholar
R. J. Bayardo and R. Agrawal. Data Privacy through Optimal k-Anonymization. In Proc. of ICDE, pp 217--228, 2005. Google ScholarDigital Library
J. H. Friedman, J. L. Bentley, and R. A. Finkel. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Transactions on Math. Softw., 3(3):209--226, 1977. Google ScholarDigital Library
A. Froomkin. The Death of Privacy. Stanford Law Review, 52(5):1461--1543, 2000.Google ScholarCross Ref
V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing Data Cubes Efficiently. In Proc. of ACM SIGMOD, pp 205--216, 1996. Google ScholarDigital Library
V. S. Iyengar. Transforming Data to Satisfy Privacy Constraints. In Proc. of SIGKDD, pp 279--288, 2002. Google ScholarDigital Library
D. Kifer and J. Gehrke. Injecting Utility into Anonymized Datasets. In Proc. of ACM SIGMOD, pp 217--228, 2006. Google ScholarDigital Library
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient Full-domain k-Anonymity. In Proc. of ACM SIGMOD, pp 49--60, 2005. Google ScholarDigital Library
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian Multidimensional k-Anonymity. In Proc. of ICDE, 2006. Google ScholarDigital Library
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware Anonymization. In Proc. of KDD, pp 277--286, 2006. Google ScholarDigital Library
N. Li, T. Li, and S. Venkatasubramanian. t-Closeness: Privacy Beyond k-Anonymity and l-Diversity. In Proc. of ICDE, pp 106--115, 2007.Google ScholarCross Ref
A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-Diversity: Privacy Beyond k-Anonymity. In Proc. of ICDE, 2006. Google ScholarDigital Library
A. Meyerson and R. Williams. On the Complexity of Optimal K-anonymity. In Proc. of ACM PODS, pp 223--228, 2004. Google ScholarDigital Library
B. Moon, H. Jagadish, and C. Faloutsos. Analysis of the Clustering Properties of the Hilbert Space-Filling Curve. IEEE TKDE, 13(1):124--141, 2001. Google ScholarDigital Library
P. Samarati and L. Sweeney. Generalizing Data to Provide Anonymity when Disclosing Information (abstract). In PODS (see also Technical Report SRI-CSL-98-04), 1998. Google ScholarDigital Library
L. Sweeney. k-Anonymity: A Model for Protecting Privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557--570, 2002. Google ScholarDigital Library
Y. Tao and X. Xiao. Personalized Privacy Preservation. In Proc. of ACM SIGMOD, pp 229--240, 2006. Google ScholarDigital Library
R. Wong, A. Fu, J. Pei, K. Wang, S. Wan, and C. Lo. Multidimensional k-anonymization by Linear Clustering Using Space-filling Curves. TR 2006-27, Simon Fraser University, March 2006.Google Scholar
X. Xiao and Y. Tao. Anatomy: Simple and Effective Privacy Preservation. In Proc. of VLDB, pp 139--150, 2006. Google ScholarDigital Library
J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. Fu. Utility-Based Anonymization Using Local Recoding. In Proc. of SIGKDD, pp 785--790, 2006. Google ScholarDigital Library
Q. Zhang, N. Koudas, D. Srivastava, and T. Yu. Aggregate Query Answering on Anonymized Tables. In Proc. of ICDE, pp 116--125, 2007.Google ScholarCross Ref
R. Zhang, P. Kalnis, B. C. Ooi, and K.-L. Tan. Generalized Multidimensional Data Mapping and Query Processing. ACM TODS, 30(3):661--697, 2005. Google ScholarDigital Library

Recommendations

Spectral Anonymization of Data

The goal of data anonymization is to allow the release of scientifically useful data in a form that protects the privacy of its subjects. This requires more than simply removing personal identifiers from the data because an attacker can still use ...
Read More
Utility-based anonymization for privacy preservation with less information loss

Privacy becomes a more and more serious concern in applications involving microdata. Recently, efficient anonymization has attracted much research work. Most of the previous methods use global recoding, which maps the domains of the quasi-identifier ...
Read More
Information based data anonymization for classification utility

Anonymization is a practical approach to protect privacy in data. The major objective of privacy preserving data publishing is to protect private information in data whereas data is still useful for some intended applications, such as building ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
VLDB '07: Proceedings of the 33rd international conference on Very large data bases
September 2007
1443 pages
ISBN:9781595936493
General Chairs:
Wolfgang Klas
University of Vienna, Austria
,
Erich J. Neuhold
Research Studio Digital Memory Engineering, Austria
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 23 September 2007
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 58
  Total Citations
  View Citations
- 1,002
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast data anonymization with low information loss

VLDB '07: Proceedings of the 33rd international conference on Very large data bases

ABSTRACT

References

Cited By

Recommendations

Spectral Anonymization of Data

Utility-based anonymization for privacy preservation with less information loss

Information based data anonymization for classification utility

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Fast data anonymization with low information loss

VLDB '07: Proceedings of the 33rd international conference on Very large data bases

ABSTRACT

References

Cited By

Recommendations

Spectral Anonymization of Data

Utility-based anonymization for privacy preservation with less information loss

Information based data anonymization for classification utility

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media