Abstract
We study statistical risk minimization problems under a privacy model in which the data is kept confidential even from the learner. In this local privacy framework, we establish sharp upper and lower bounds on the convergence rates of statistical estimation procedures. As a consequence, we exhibit a precise tradeoff between the amount of privacy the data preserves and the utility, as measured by convergence rate, of any statistical estimator or learning procedure.
- A. Agarwal, P. L. Bartlett, P. Ravikumar, and M. J. Wainwright. 2012. Information-theoretic lower bounds on the oracle complexity of convex optimization. IEEE Trans. Inf. Theory 58, 5, 3235--3249. Google ScholarDigital Library
- A. Beck and M. Teboulle. 2003. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31, 167--175. Google ScholarDigital Library
- D. P. Bertsekas and J. N. Tsitsiklis. 1989. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Inc. Google ScholarDigital Library
- P. Billingsley. 1986. Probability and Measure, 2nd Ed. Wiley.Google Scholar
- A. Blum, K. Ligett, and A. Roth. 2008. A learning theory approach to non-interactive database privacy. In Proceedings of the 40th Annual ACM Symposium on the Theory of Computing. Google ScholarDigital Library
- S. Boyd and L. Vandenberghe. 2004. Convex Optimization. Cambridge University Press. Google ScholarDigital Library
- K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. 2011. Differentially private empirical risk minimization. J. Machine Learn. Res. 12, 1069--1109. Google ScholarDigital Library
- T. M. Cover and J. A. Thomas. 2006. Elements of information theory, 2nd Ed. Wiley. Google ScholarDigital Library
- L. H. Cox, A. F. Karr, and S. K. Kinney. 2011. Risk-utility paradigms for statistical disclosure limitation: How to think, but not how to act. Int. Stat. Rev. 79, 2, 160--199.Google ScholarCross Ref
- I. Csisz'ar and J. Körner. 1981. Information Theory: Coding Theorems for Discrete Memoryless Systems. Academic Press. Google ScholarDigital Library
- I. Dinur and K. Nissim. 2003. Revealing information while preserving privacy. In Proceedings of the 22nd Symposium on Principles of Database Systems. 202--210. Google ScholarDigital Library
- J. C. Duchi, M. I. Jordan, and M. J. Wainwright. 2013. Local privacy and statistical minimax rates. arXiv:1302.3203 {math.ST}.Google Scholar
- G. T. Duncan and D. Lambert. 1986. Disclosure-limited data dissemination. J. Amer. Stat. Assoc. 81, 393, 10--18.Google Scholar
- G. T. Duncan and D. Lambert. 1989. The risk of disclosure for microdata. J. Busin. Economic Statistics 7, 2, 207--217.Google Scholar
- C. Dwork. 2008. Differential privacy: A survey of results. In Theory and Applications of Models of Computation, Lecture Notes in Computer Science Series, vol. 4978, Springer, 1--19. Google ScholarDigital Library
- C. Dwork and J. Lei. 2009. Differential privacy and robust statistics. In Proceedings of the 41st Annual ACM Symposium on the Theory of Computing. Google ScholarDigital Library
- C. Dwork, F. Mcsherry, K. Nissim, and A. Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference. 265--284. Google ScholarDigital Library
- C. Dwork, G. N. Rothblum, and S. P. Vadhan. 2010. Boosting and differential privacy. In Proceedings of the 51st Annual Symposium on Foundations of Computer Science. 51--60. Google ScholarDigital Library
- A. V. Evfimievski, J. Gehrke, and R. Srikant. 2003. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the 22nd Symposium on Principles of Database Systems. 211--222. Google ScholarDigital Library
- I. P. Fellegi. 1972. On the question of statistical confidentiality. J. Amer. Stat. Assoc. 67, 337, 7--18.Google ScholarCross Ref
- S. R. Ganta, S. Kasiviswanathan, and A. Smith. 2008. Composition attacks and auxiliary information in data privacy. In Proceedings of the 14th ACM SIGKDD Conference on Knowledge and Data Discovery (KDD). Google ScholarDigital Library
- A. Ghosh, T. Roughgarden, and M. Sundararajan. 2009. Universally utility-maximizing privacy mechanisms. In Proceedings of the 41st Annual ACM Symposium on the Theory of Computing. Google ScholarDigital Library
- R. M. Gray. 1990. Entropy and information theory. Springer. Google ScholarDigital Library
- R. Hall, A. Rinaldo, and L. Wasserman. 2011. Random differential privacy. arXiv:1112.2680 {stat.ME}.Google Scholar
- M. Hardt and K. Talwar. 2010. On the geometry of differential privacy. In Proceedings of the 42nd Annual ACM Symposium on the Theory of Computing. 705--714. Google ScholarDigital Library
- J. Hiriart-Urruty and C. Lemaréchal. 1996. Convex Analysis and Minimization Algorithms I & II. Springer, New York.Google Scholar
- O. Kallenberg. 1997. Foundations of Modern Probability. Springer.Google Scholar
- A. F. Karr, C. N. Kohnen, A. Oganian, J. P. Reiter, and A. P. Sanil. 2006. A framework for evaluating the utility of data altered to protect confidentiality. Amer. Statistician 60, 3, 224--232.Google ScholarCross Ref
- S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. 2011. What can we learn privately? SIAM J. Comput. 40, 3, 793--826. Google ScholarDigital Library
- S. P. Kasiviswanathan, M. Rudelson, and A. Smith. 2013. The power of linear reconstruction attacks. In Proceedings of the 45th Annual ACM Symposium on the Theory of Computing.Google Scholar
- M. Kearns. 1998. Efficient noise-tolerant learning from statistical queries. J. ACM 45, 6, 983--1006. Google ScholarDigital Library
- L. Le Cam. 1956. On the asymptotic theory of estimation and hypothesis testing. In Proceedings of the 3rd Berkeley Symposium on Mathematical Statistics and Probability, 129--156.Google Scholar
- L. Le Cam. 1973. Convergence of estimates under dimensionality restrictions. Ann. Stat. 1, 1, 38--53.Google ScholarCross Ref
- Y. Liang, H. V. Poor, and S. Shamai. 2008. Information theoretic security. Found. Trends Commun. Inf. Theory 5, 4, 355--580. Google ScholarDigital Library
- O. L. Mangasarian. 1979. Uniqueness of solution in linear programming. Linear Algebra Appl. 25, 151--162.Google ScholarCross Ref
- A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. 2009. Robust stochastic approximation approach to stochastic programming. SIAM J. Optimiz. 19, 4, 1574--1609. Google ScholarDigital Library
- A. Nemirovski and D. Yudin. 1983. Problem Complexity and Method Efficiency in Optimization. Wiley.Google Scholar
- A. Nikolov, K. Talwar, and L. Zhang. 2013. The geometry of differential privacy: The sparse and approximate case. In Proceedings of the 45th Annual ACM Symposium on the Theory of Computing. Google ScholarDigital Library
- R. R. Phelps. 2001. Lectures on Choquet's Theorem, 2nd Ed. Springer.Google Scholar
- B. T. Polyak and A. B. Juditsky. 1992. Acceleration of stochastic approximation by averaging. SIAM J. Cont. Optimiz. 30, 4, 838--855. Google ScholarDigital Library
- J. P. Reiter. 2005. Estimating risks of identification disclosure in microdata. J. Amer. Stat. Assoc. 100, 1103--1113.Google ScholarCross Ref
- B. I. P. Rubinstein, P. L. Bartlett, L. Huang, and N. Taft. 2012. Learning in a large function space: privacy-preserving mechanisms for SVM learning. J. Priv. Confident. 4, 1, 65--100.Google ScholarCross Ref
- L. Sankar, S. R. Rajagopalan, and H. V. Poor. 2010. An information-theoretic approach to privacy. In Proceedings of the 48th Allerton Conference on Communication, Control, and Computing. 1220--1227.Google Scholar
- A. Smith. 2011. Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the 43rd Annual ACM Symposium on the Theory of Computing. Google ScholarDigital Library
- A. W. Van Der Vaart. 1998. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press.Google Scholar
- A. Wald. 1939. Contributions to the theory of statistical estimation and testing hypotheses. Ann. Math. Stat. 10, 4, 299--326.Google ScholarCross Ref
- S. Warner. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Stat. Assoc. 60, 309, 63--69.Google ScholarCross Ref
- L. Wasserman and S. Zhou. 2010. A statistical framework for differential privacy. J. Amer. Stat. Assoc. 105, 489, 375--389.Google ScholarCross Ref
- Y. Yang and A. Barron. 1999. Information-theoretic determination of minimax rates of convergence. Ann. Statics 27, 5, 1564--1599.Google ScholarCross Ref
- B. Yu. 1997. Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam. Springer-Verlag, 423--435.Google Scholar
- S. Zhou, J. Lafferty, and L. Wasserman. 2009a. Compressed regression. IEEE Trans. Inf. Theory 55, 2, 846--866. Google ScholarDigital Library
- S. Zhou, K. Ligett, and L. Wasserman. 2009b. Differential privacy with compression. In Proceedings of the IEEE International Symposium on Information Theory. Google ScholarDigital Library
- M. Zinkevich. 2003. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning.Google Scholar
Index Terms
- Privacy Aware Learning
Recommendations
When Machine Learning Meets Privacy: A Survey and Outlook
The newly emerged machine learning (e.g., deep learning) methods have become a strong driving force to revolutionize a wide range of industries, such as smart healthcare, financial technology, and surveillance systems. Meanwhile, privacy has emerged as ...
Image Features Anonymization for Privacy Aware Machine Learning
Machine Learning, Optimization, and Data ScienceAbstractData privacy is a major public concern in the digital age, especially image data that provides a large amount of information. The wariness about the use of image data affect the sharing and publication of these data. In this context, Differential ...
Preserving data privacy in machine learning systems
AbstractThe wide adoption of Machine Learning to solve a large set of real-life problems came with the need to collect and process large volumes of data, some of which are considered personal and sensitive, raising serious concerns about data protection. ...
Comments