ABSTRACT
We address privacy-preserving classification problem in a distributed system. Randomization has been the approach proposed to preserve privacy in such scenario. However, this approach is now proven to be insecure as it has been discovered that some privacy intrusion techniques can be used to reconstruct private information from the randomized data tuples. We introduce an algebraic-technique-based scheme. Compared to the randomization approach, our new scheme can build classifiers more accurately but disclose less private information. Furthermore, our new scheme can be readily integrated as a middleware with existing systems.
- D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 247--255. ACM Press, 2001.]] Google ScholarDigital Library
- R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceedings of the 19th ACM SIGMOD International Conference on Management of Data, pages 439--450. ACM Press, 2000.]] Google ScholarDigital Library
- S. Agrawal, V. Krishnan, and J. R. Haritsa. On addressing efficiency concerns in privacy-preserving mining. In Proceedings of the 9th International Conference on Database Systems for Advanced Applications, pages 439--450. Springer Verlag, 2004.]]Google ScholarCross Ref
- C. Blake and C. Merz. UCI repository of machine learning databases, 1998.]]Google Scholar
- T. M. Cover and J. A. Thomas. Elements of information theory. Wiley-Interscience, 1991.]] Google ScholarDigital Library
- L. Cranor, J. Reagle, and M. S. Ackerman. Beyond concern: Understanding net users' attitudes about online privacy. Technical Report TR 99.4.3, AT&T Labs-Research, 1999.]]Google Scholar
- W. Du and Z. Zhan. Building decision tree classifier on private data. In Proceedings of the IEEE International Conference on Privacy, Security and Data Mining, pages 1--8. Australian Computer Society, Inc., 2002.]] Google ScholarDigital Library
- W. Du and Z. Zhan. Using randomized response techniques for privacy-preserving data mining. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 505--510. ACM Press, 2003.]] Google ScholarDigital Library
- G. H. Golub and C. F. V. Loan. Matrix Computation. John Hopkins University Press, 1996.]]Google Scholar
- J. Han and M. Kamber. Data Mining Concepts and Techniques. Morgan Kaufmann, 2001.]] Google ScholarDigital Library
- HIPAA. Health insurance portability and accountability act, 2002. available at http://www.hhs.gov/ocr/hipaa/privrulepd.pdf.]]Google Scholar
- M. Kantarcioglu and J. Vaidya. Privacy preserving naïve bayes classifier for horizontally partitioned data. In Workshop on Privacy Preserving Data Mining held in association with The 3rd IEEE International Conference on Data Mining, 2003.]]Google Scholar
- H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. On the privacy preserving properties of random data perturbation techniques. In Proceedings of the 3rd IEEE International Conference on Data Mining, pages 99--106. IEEE Press, 2003.]] Google ScholarDigital Library
- Y. Lindell and B. Pinkas. Privacy preserving data mining. In Proceedings of the 20th Annual International Cryptology Conference on Advances in Cryptology, pages 36--54. Springer Verlag, 2000.]] Google ScholarDigital Library
- J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81--106, 1986.]] Google ScholarDigital Library
- J. Vaidya and C. Clifton. Privacy preserving naïve bayes classifier for vertically partitioned data. In Proceedings of the 4th SIAM Conference on Data Mining, pages 330--334. SIAM Press, 2004.]]Google ScholarCross Ref
- N. Zhang, S. Wang, and W. Zhao. A new scheme on privacy preserving association rule mining. In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases. Springer Verlag, 2004.]] Google ScholarDigital Library
Index Terms
- A new scheme on privacy-preserving data classification
Recommendations
Privacy-preserving data sharing in cloud computing
Storing and sharing databases in the cloud of computers raise serious concern of individual privacy. We consider two kinds of privacy risk: presence leakage, by which the attackers can explicitly identify individuals in (or not in) the database, and ...
A Survey on Privacy Preserving Dynamic Data Publishing
Many organizations, especially small and medium business SMB enterprises require the collection and sharing of data containing personal information. The privacy of this data must be preserved before outsourcing to the commercial public. Privacy ...
Comments