skip to main content
10.1145/1150402.1150435acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Workload-aware anonymization

Published:20 August 2006Publication History

ABSTRACT

Protecting data privacy is an important problem in microdata distribution. Anonymization algorithms typically aim to protect individual privacy, with minimal impact on the quality of the resulting data. While the bulk of previous work has measured quality through one-size-fits-all measures, we argue that quality is best judged with respect to the workload for which the data will ultimately be used.This paper provides a suite of anonymization algorithms that produce an anonymous view based on a target class of workloads, consisting of one or more data mining tasks, as well as selection predicates. An extensive experimental evaluation indicates that this approach is often more effective than previous anonymization techniques.

References

  1. C. Aggarwal and P. Yu. A condensation approach to privacy-preserving data mining. In EDBT, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  2. G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Anonymizing tables. In ICDT, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Agrawal, S. Ghosh, T. Imielinski, and A. Swami. Database mining: A performance perspective. In IEEE Transactions on Knowledge and Data Engineering, volume 5, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Agrawal and R. Srikant. Privacy-preserving data mining. In SIGMOD, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Blake and C. Merz. UCI repository of machine learning databases, 1998.Google ScholarGoogle Scholar
  7. L. Breiman, J. Freidman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth International Group, Belmont, CA, 1984.Google ScholarGoogle Scholar
  8. S. Chawla, C. Dwork, F. McSherry, and K. Talwar. On the utility of privacy-preserving histograms. In Uncertainty in Artificial Intelligence, 2005.Google ScholarGoogle Scholar
  9. B. Chen, L. Chen, Y. Lin, and R. Ramakrishnan. Prediction cubes. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Domingo-Ferrer and J. Mateo-Sanz. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering, 4(1), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of association rules. In SIGKDD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. U. M. Fayyad and K. Irani. On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8:87--102, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Fung, K. Wang, and P. Yu. Top-down specialization for information and privacy preservation. In ICDE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Gehrke, R. Ramakrishnan, and V. Ganti. RainForest: A framework for fast decision tree construction of large datasets. In VLDB, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. Iyengar. Transforming data to satisfy privacy constraints. In ACM SIGKDD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. LeFevre, D.DeWitt, and R. Ramakrishnan. Incognito: Efficient full-domain k-anonymity. In ACM SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. LeFevre, D. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-Diversity: Privacy beyond k-anonymity. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Meyerson and R. Williams. On the complexity of optimal k-anonymity. In PODS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Reiss. Practical data-swapping: The first steps. ACM Transactions on Database Systems, 9:20--37, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Rizvi and J. R. Haritsa. Maintaining data privacy in association rule mining. In VLDB, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Samarati. Protecting respondents' identities in microdata release. IEEE Trans. on Knowledge and Data Engineering, 13(6), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. Int'l Journal on Uncertainty, Fuzziness, and Knowledge-based Systems, 10(5):571--588, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Wang, P. Yu, and S. Chakraborty. Bottom-up generalization: A data mining solution to privacy protection. In ICDM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. I. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2nd edition, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Zhang and V. Honavar. Learning decision tree classifiers from attribute value taxonomies and partially specified data. In ICML, 2003.Google ScholarGoogle Scholar

Index Terms

  1. Workload-aware anonymization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2006
      986 pages
      ISBN:1595933395
      DOI:10.1145/1150402

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 August 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader