skip to main content
10.5555/1325851.1325938dlproceedingsArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
research-article

Fast data anonymization with low information loss

Published:23 September 2007Publication History

ABSTRACT

Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the privacy preserving paradigms of k-anonymity and l-diversity. k-anonymity protects against the identification of an individual's record. l-diversity, in addition, safeguards against the association of an individual with specific sensitive information. However, existing approaches suffer from at least one of the following drawbacks: (i) The information loss metrics are counter-intuitive and fail to capture data inaccuracies inflicted for the sake of privacy. (ii) l-diversity is solved by techniques developed for the simpler k-anonymity problem, which introduces unnecessary inaccuracies. (iii) The anonymization process is inefficient in terms of computation and I/O cost.

In this paper we propose a framework for efficient privacy preservation that addresses these deficiencies. First, we focus on one-dimensional (i.e., single attribute) quasi-identifiers, and study the properties of optimal solutions for k-anonymity and l-diversity, based on meaningful information loss metrics. Guided by these properties, we develop efficient heuristics to solve the one-dimensional problems in linear time. Finally, we generalize our solutions to multi-dimensional quasi-identifiers using space-mapping techniques. Extensive experimental evaluation shows that our techniques clearly outperform the state-of-the-art, in terms of execution time and information loss.

References

  1. G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu. Achieving Anonymity via Clustering. In Proc. of ACM PODS, pp 153--162, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Approximation Algorithms for k-Anonymity. Journal of Privacy Technology, (Paper number: 20051120001), 2005.Google ScholarGoogle Scholar
  3. R. J. Bayardo and R. Agrawal. Data Privacy through Optimal k-Anonymization. In Proc. of ICDE, pp 217--228, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. H. Friedman, J. L. Bentley, and R. A. Finkel. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Transactions on Math. Softw., 3(3):209--226, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Froomkin. The Death of Privacy. Stanford Law Review, 52(5):1461--1543, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  6. V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing Data Cubes Efficiently. In Proc. of ACM SIGMOD, pp 205--216, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. V. S. Iyengar. Transforming Data to Satisfy Privacy Constraints. In Proc. of SIGKDD, pp 279--288, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Kifer and J. Gehrke. Injecting Utility into Anonymized Datasets. In Proc. of ACM SIGMOD, pp 217--228, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient Full-domain k-Anonymity. In Proc. of ACM SIGMOD, pp 49--60, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian Multidimensional k-Anonymity. In Proc. of ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware Anonymization. In Proc. of KDD, pp 277--286, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Li, T. Li, and S. Venkatasubramanian. t-Closeness: Privacy Beyond k-Anonymity and l-Diversity. In Proc. of ICDE, pp 106--115, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  13. A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-Diversity: Privacy Beyond k-Anonymity. In Proc. of ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Meyerson and R. Williams. On the Complexity of Optimal K-anonymity. In Proc. of ACM PODS, pp 223--228, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Moon, H. Jagadish, and C. Faloutsos. Analysis of the Clustering Properties of the Hilbert Space-Filling Curve. IEEE TKDE, 13(1):124--141, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Samarati and L. Sweeney. Generalizing Data to Provide Anonymity when Disclosing Information (abstract). In PODS (see also Technical Report SRI-CSL-98-04), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Sweeney. k-Anonymity: A Model for Protecting Privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557--570, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Tao and X. Xiao. Personalized Privacy Preservation. In Proc. of ACM SIGMOD, pp 229--240, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Wong, A. Fu, J. Pei, K. Wang, S. Wan, and C. Lo. Multidimensional k-anonymization by Linear Clustering Using Space-filling Curves. TR 2006-27, Simon Fraser University, March 2006.Google ScholarGoogle Scholar
  20. X. Xiao and Y. Tao. Anatomy: Simple and Effective Privacy Preservation. In Proc. of VLDB, pp 139--150, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. Fu. Utility-Based Anonymization Using Local Recoding. In Proc. of SIGKDD, pp 785--790, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Q. Zhang, N. Koudas, D. Srivastava, and T. Yu. Aggregate Query Answering on Anonymized Tables. In Proc. of ICDE, pp 116--125, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  23. R. Zhang, P. Kalnis, B. C. Ooi, and K.-L. Tan. Generalized Multidimensional Data Mapping and Query Processing. ACM TODS, 30(3):661--697, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    VLDB '07: Proceedings of the 33rd international conference on Very large data bases
    September 2007
    1443 pages
    ISBN:9781595936493

    Publisher

    VLDB Endowment

    Publication History

    • Published: 23 September 2007

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader