ABSTRACT
When a table containing individual data is published, disclosure of sensitive information should be prohibitive. A naive approach for the problem is to remove identifiers such as name and social security number. However, linking attacks which joins the published table with other tables on some attributes, called quasi-identifier, may reveal the sensitive information. To protect privacy against linking attack, the notion of k-anonymity which makes each record in the table be indistinguishable with k-1 other records has been proposed previously. It is shown to be NP-Hard to k-anonymize a table minimizing the number of suppressed cells. To alleviate this, O(k log k)-approximation and O(k)-approximation algorithms were proposed in previous works.
In this paper, we propose several approximation algorithms that guarantee O(log k)-approximation ratio and perform significantly better than the traditional algorithms. We also provide O(ß log k)-approximate algorithms which gracefully adjust their running time according to the tolerance é (≥ 1) of the approximation ratios. Experimental results confirm that our approximation algorithms perform significantly better than traditional approximation algorithms.
- C. C. Aggarwal. On k-anonymity and the curse of dimensionality. In VLDB, pages 901--909, 2005. Google ScholarDigital Library
- G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu. Achieving anonymity via clustering. In VLDB, 2006.Google ScholarDigital Library
- G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Anonymizing tables. In ICDT, pages 246--258, 2005. Google ScholarDigital Library
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, pages 487--499, 1994. Google ScholarDigital Library
- R. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, pages 217--228, 2005. Google ScholarDigital Library
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms (Second Edition). McGraw Hill and MIT Press, 2001.Google Scholar
- D. S. Johnson. Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences, 9:256--278, 1974.Google ScholarDigital Library
- J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD Conference, pages 1--12, 2000. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient full-domain k-anonymity. In SIGMOD, pages 49--60, 2005. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, page 25, 2006. Google ScholarDigital Library
- A. Machanavajjhala, J. Gehrke, and D. Kifer. l-diversity: Privacy beyond k-anonymity. In ICDE, 2006. Google ScholarDigital Library
- A. Meyerson and R. Williams. On the complexity of optimal k-anonymity. In Proc. of PODS, 2004. Google ScholarDigital Library
- N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In ICDT, pages 398--416, 1999. Google ScholarDigital Library
- U.C. Irvine Machine Learning Repository. http://www.ics.uci.edu/~mlearn/mlsummary.html.Google Scholar
- P. Samarati and L. Sweeny. Generalizing data to provide anonymity when disclosing information (abstract). In In Proc. of ACM Symposium on Principles of Database Systems, page 188, 1998. Google ScholarDigital Library
- L. Sweeny. k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowedge-based Systems, 10(5):557--570, 2002. Google ScholarDigital Library
- L. Willenborg and T. deWaal. Elements of statistical disclosure control. Springer Verlog Lecture Notes in Statistics, 2000.Google Scholar
Index Terms
- Approximate algorithms for K-anonymity
Recommendations
(α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningPrivacy preservation is an important issue in the release of data for mining purposes. The k-anonymity model has been introduced for protecting individual identification. Recent studies show that a more sophisticated model is necessary to protect the ...
Approximate algorithms with generalizing attribute values for k-anonymity
When a table containing individual data is published, disclosure of sensitive information should be prohibitive. Since simply removing identifiers such as name and social security number may reveal the sensitive information by linking attacks which ...
(α, k)-anonymous data publishing
Privacy preservation is an important issue in the release of data for mining purposes. The k-anonymity model has been introduced for protecting individual identification. Recent studies show that a more sophisticated model is necessary to protect the ...
Comments