ABSTRACT
Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the privacy preserving paradigms of k-anonymity and l-diversity. k-anonymity protects against the identification of an individual's record. l-diversity, in addition, safeguards against the association of an individual with specific sensitive information. However, existing approaches suffer from at least one of the following drawbacks: (i) The information loss metrics are counter-intuitive and fail to capture data inaccuracies inflicted for the sake of privacy. (ii) l-diversity is solved by techniques developed for the simpler k-anonymity problem, which introduces unnecessary inaccuracies. (iii) The anonymization process is inefficient in terms of computation and I/O cost.
In this paper we propose a framework for efficient privacy preservation that addresses these deficiencies. First, we focus on one-dimensional (i.e., single attribute) quasi-identifiers, and study the properties of optimal solutions for k-anonymity and l-diversity, based on meaningful information loss metrics. Guided by these properties, we develop efficient heuristics to solve the one-dimensional problems in linear time. Finally, we generalize our solutions to multi-dimensional quasi-identifiers using space-mapping techniques. Extensive experimental evaluation shows that our techniques clearly outperform the state-of-the-art, in terms of execution time and information loss.
- G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu. Achieving Anonymity via Clustering. In Proc. of ACM PODS, pp 153--162, 2006. Google ScholarDigital Library
- G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Approximation Algorithms for k-Anonymity. Journal of Privacy Technology, (Paper number: 20051120001), 2005.Google Scholar
- R. J. Bayardo and R. Agrawal. Data Privacy through Optimal k-Anonymization. In Proc. of ICDE, pp 217--228, 2005. Google ScholarDigital Library
- J. H. Friedman, J. L. Bentley, and R. A. Finkel. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Transactions on Math. Softw., 3(3):209--226, 1977. Google ScholarDigital Library
- A. Froomkin. The Death of Privacy. Stanford Law Review, 52(5):1461--1543, 2000.Google ScholarCross Ref
- V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing Data Cubes Efficiently. In Proc. of ACM SIGMOD, pp 205--216, 1996. Google ScholarDigital Library
- V. S. Iyengar. Transforming Data to Satisfy Privacy Constraints. In Proc. of SIGKDD, pp 279--288, 2002. Google ScholarDigital Library
- D. Kifer and J. Gehrke. Injecting Utility into Anonymized Datasets. In Proc. of ACM SIGMOD, pp 217--228, 2006. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient Full-domain k-Anonymity. In Proc. of ACM SIGMOD, pp 49--60, 2005. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian Multidimensional k-Anonymity. In Proc. of ICDE, 2006. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware Anonymization. In Proc. of KDD, pp 277--286, 2006. Google ScholarDigital Library
- N. Li, T. Li, and S. Venkatasubramanian. t-Closeness: Privacy Beyond k-Anonymity and l-Diversity. In Proc. of ICDE, pp 106--115, 2007.Google ScholarCross Ref
- A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-Diversity: Privacy Beyond k-Anonymity. In Proc. of ICDE, 2006. Google ScholarDigital Library
- A. Meyerson and R. Williams. On the Complexity of Optimal K-anonymity. In Proc. of ACM PODS, pp 223--228, 2004. Google ScholarDigital Library
- B. Moon, H. Jagadish, and C. Faloutsos. Analysis of the Clustering Properties of the Hilbert Space-Filling Curve. IEEE TKDE, 13(1):124--141, 2001. Google ScholarDigital Library
- P. Samarati and L. Sweeney. Generalizing Data to Provide Anonymity when Disclosing Information (abstract). In PODS (see also Technical Report SRI-CSL-98-04), 1998. Google ScholarDigital Library
- L. Sweeney. k-Anonymity: A Model for Protecting Privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557--570, 2002. Google ScholarDigital Library
- Y. Tao and X. Xiao. Personalized Privacy Preservation. In Proc. of ACM SIGMOD, pp 229--240, 2006. Google ScholarDigital Library
- R. Wong, A. Fu, J. Pei, K. Wang, S. Wan, and C. Lo. Multidimensional k-anonymization by Linear Clustering Using Space-filling Curves. TR 2006-27, Simon Fraser University, March 2006.Google Scholar
- X. Xiao and Y. Tao. Anatomy: Simple and Effective Privacy Preservation. In Proc. of VLDB, pp 139--150, 2006. Google ScholarDigital Library
- J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. Fu. Utility-Based Anonymization Using Local Recoding. In Proc. of SIGKDD, pp 785--790, 2006. Google ScholarDigital Library
- Q. Zhang, N. Koudas, D. Srivastava, and T. Yu. Aggregate Query Answering on Anonymized Tables. In Proc. of ICDE, pp 116--125, 2007.Google ScholarCross Ref
- R. Zhang, P. Kalnis, B. C. Ooi, and K.-L. Tan. Generalized Multidimensional Data Mapping and Query Processing. ACM TODS, 30(3):661--697, 2005. Google ScholarDigital Library
Recommendations
Spectral Anonymization of Data
The goal of data anonymization is to allow the release of scientifically useful data in a form that protects the privacy of its subjects. This requires more than simply removing personal identifiers from the data because an attacker can still use ...
Utility-based anonymization for privacy preservation with less information loss
Privacy becomes a more and more serious concern in applications involving microdata. Recently, efficient anonymization has attracted much research work. Most of the previous methods use global recoding, which maps the domains of the quasi-identifier ...
Information based data anonymization for classification utility
Anonymization is a practical approach to protect privacy in data. The major objective of privacy preserving data publishing is to protect private information in data whereas data is still useful for some intended applications, such as building ...
Comments