research-article

Privacy Aware Learning

Authors:
John C. Duchi

University of California, Berkeley, CA

University of California, Berkeley, CA
View Profile

,
Michael I. Jordan

University of California, Berkeley, CA

University of California, Berkeley, CA
View Profile

,
Martin J. Wainwright

University of California, Berkeley, CA

University of California, Berkeley, CA
View Profile

Authors Info & Claims

Journal of the ACM Volume 61 Issue 6Article No.: 38pp 1–57https://doi.org/10.1145/2666468

Published:17 December 2014Publication History

Journal of the ACM

Abstract

We study statistical risk minimization problems under a privacy model in which the data is kept confidential even from the learner. In this local privacy framework, we establish sharp upper and lower bounds on the convergence rates of statistical estimation procedures. As a consequence, we exhibit a precise tradeoff between the amount of privacy the data preserves and the utility, as measured by convergence rate, of any statistical estimator or learning procedure.

References

A. Agarwal, P. L. Bartlett, P. Ravikumar, and M. J. Wainwright. 2012. Information-theoretic lower bounds on the oracle complexity of convex optimization. IEEE Trans. Inf. Theory 58, 5, 3235--3249. Google ScholarDigital Library
A. Beck and M. Teboulle. 2003. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31, 167--175. Google ScholarDigital Library
D. P. Bertsekas and J. N. Tsitsiklis. 1989. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Inc. Google ScholarDigital Library
P. Billingsley. 1986. Probability and Measure, 2nd Ed. Wiley.Google Scholar
A. Blum, K. Ligett, and A. Roth. 2008. A learning theory approach to non-interactive database privacy. In Proceedings of the 40th Annual ACM Symposium on the Theory of Computing. Google ScholarDigital Library
S. Boyd and L. Vandenberghe. 2004. Convex Optimization. Cambridge University Press. Google ScholarDigital Library
K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. 2011. Differentially private empirical risk minimization. J. Machine Learn. Res. 12, 1069--1109. Google ScholarDigital Library
T. M. Cover and J. A. Thomas. 2006. Elements of information theory, 2nd Ed. Wiley. Google ScholarDigital Library
L. H. Cox, A. F. Karr, and S. K. Kinney. 2011. Risk-utility paradigms for statistical disclosure limitation: How to think, but not how to act. Int. Stat. Rev. 79, 2, 160--199.Google ScholarCross Ref
I. Csisz'ar and J. Körner. 1981. Information Theory: Coding Theorems for Discrete Memoryless Systems. Academic Press. Google ScholarDigital Library
I. Dinur and K. Nissim. 2003. Revealing information while preserving privacy. In Proceedings of the 22nd Symposium on Principles of Database Systems. 202--210. Google ScholarDigital Library
J. C. Duchi, M. I. Jordan, and M. J. Wainwright. 2013. Local privacy and statistical minimax rates. arXiv:1302.3203 {math.ST}.Google Scholar
G. T. Duncan and D. Lambert. 1986. Disclosure-limited data dissemination. J. Amer. Stat. Assoc. 81, 393, 10--18.Google Scholar
G. T. Duncan and D. Lambert. 1989. The risk of disclosure for microdata. J. Busin. Economic Statistics 7, 2, 207--217.Google Scholar
C. Dwork. 2008. Differential privacy: A survey of results. In Theory and Applications of Models of Computation, Lecture Notes in Computer Science Series, vol. 4978, Springer, 1--19. Google ScholarDigital Library
C. Dwork and J. Lei. 2009. Differential privacy and robust statistics. In Proceedings of the 41st Annual ACM Symposium on the Theory of Computing. Google ScholarDigital Library
C. Dwork, F. Mcsherry, K. Nissim, and A. Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference. 265--284. Google ScholarDigital Library
C. Dwork, G. N. Rothblum, and S. P. Vadhan. 2010. Boosting and differential privacy. In Proceedings of the 51st Annual Symposium on Foundations of Computer Science. 51--60. Google ScholarDigital Library
A. V. Evfimievski, J. Gehrke, and R. Srikant. 2003. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the 22nd Symposium on Principles of Database Systems. 211--222. Google ScholarDigital Library
I. P. Fellegi. 1972. On the question of statistical confidentiality. J. Amer. Stat. Assoc. 67, 337, 7--18.Google ScholarCross Ref
S. R. Ganta, S. Kasiviswanathan, and A. Smith. 2008. Composition attacks and auxiliary information in data privacy. In Proceedings of the 14th ACM SIGKDD Conference on Knowledge and Data Discovery (KDD). Google ScholarDigital Library
A. Ghosh, T. Roughgarden, and M. Sundararajan. 2009. Universally utility-maximizing privacy mechanisms. In Proceedings of the 41st Annual ACM Symposium on the Theory of Computing. Google ScholarDigital Library
R. M. Gray. 1990. Entropy and information theory. Springer. Google ScholarDigital Library
R. Hall, A. Rinaldo, and L. Wasserman. 2011. Random differential privacy. arXiv:1112.2680 {stat.ME}.Google Scholar
M. Hardt and K. Talwar. 2010. On the geometry of differential privacy. In Proceedings of the 42nd Annual ACM Symposium on the Theory of Computing. 705--714. Google ScholarDigital Library
J. Hiriart-Urruty and C. Lemaréchal. 1996. Convex Analysis and Minimization Algorithms I & II. Springer, New York.Google Scholar
O. Kallenberg. 1997. Foundations of Modern Probability. Springer.Google Scholar
A. F. Karr, C. N. Kohnen, A. Oganian, J. P. Reiter, and A. P. Sanil. 2006. A framework for evaluating the utility of data altered to protect confidentiality. Amer. Statistician 60, 3, 224--232.Google ScholarCross Ref
S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. 2011. What can we learn privately&quest; SIAM J. Comput. 40, 3, 793--826. Google ScholarDigital Library
S. P. Kasiviswanathan, M. Rudelson, and A. Smith. 2013. The power of linear reconstruction attacks. In Proceedings of the 45th Annual ACM Symposium on the Theory of Computing.Google Scholar
M. Kearns. 1998. Efficient noise-tolerant learning from statistical queries. J. ACM 45, 6, 983--1006. Google ScholarDigital Library
L. Le Cam. 1956. On the asymptotic theory of estimation and hypothesis testing. In Proceedings of the 3rd Berkeley Symposium on Mathematical Statistics and Probability, 129--156.Google Scholar
L. Le Cam. 1973. Convergence of estimates under dimensionality restrictions. Ann. Stat. 1, 1, 38--53.Google ScholarCross Ref
Y. Liang, H. V. Poor, and S. Shamai. 2008. Information theoretic security. Found. Trends Commun. Inf. Theory 5, 4, 355--580. Google ScholarDigital Library
O. L. Mangasarian. 1979. Uniqueness of solution in linear programming. Linear Algebra Appl. 25, 151--162.Google ScholarCross Ref
A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. 2009. Robust stochastic approximation approach to stochastic programming. SIAM J. Optimiz. 19, 4, 1574--1609. Google ScholarDigital Library
A. Nemirovski and D. Yudin. 1983. Problem Complexity and Method Efficiency in Optimization. Wiley.Google Scholar
A. Nikolov, K. Talwar, and L. Zhang. 2013. The geometry of differential privacy: The sparse and approximate case. In Proceedings of the 45th Annual ACM Symposium on the Theory of Computing. Google ScholarDigital Library
R. R. Phelps. 2001. Lectures on Choquet's Theorem, 2nd Ed. Springer.Google Scholar
B. T. Polyak and A. B. Juditsky. 1992. Acceleration of stochastic approximation by averaging. SIAM J. Cont. Optimiz. 30, 4, 838--855. Google ScholarDigital Library
J. P. Reiter. 2005. Estimating risks of identification disclosure in microdata. J. Amer. Stat. Assoc. 100, 1103--1113.Google ScholarCross Ref
B. I. P. Rubinstein, P. L. Bartlett, L. Huang, and N. Taft. 2012. Learning in a large function space: privacy-preserving mechanisms for SVM learning. J. Priv. Confident. 4, 1, 65--100.Google ScholarCross Ref
L. Sankar, S. R. Rajagopalan, and H. V. Poor. 2010. An information-theoretic approach to privacy. In Proceedings of the 48th Allerton Conference on Communication, Control, and Computing. 1220--1227.Google Scholar
A. Smith. 2011. Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the 43rd Annual ACM Symposium on the Theory of Computing. Google ScholarDigital Library
A. W. Van Der Vaart. 1998. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press.Google Scholar
A. Wald. 1939. Contributions to the theory of statistical estimation and testing hypotheses. Ann. Math. Stat. 10, 4, 299--326.Google ScholarCross Ref
S. Warner. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Stat. Assoc. 60, 309, 63--69.Google ScholarCross Ref
L. Wasserman and S. Zhou. 2010. A statistical framework for differential privacy. J. Amer. Stat. Assoc. 105, 489, 375--389.Google ScholarCross Ref
Y. Yang and A. Barron. 1999. Information-theoretic determination of minimax rates of convergence. Ann. Statics 27, 5, 1564--1599.Google ScholarCross Ref
B. Yu. 1997. Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam. Springer-Verlag, 423--435.Google Scholar
S. Zhou, J. Lafferty, and L. Wasserman. 2009a. Compressed regression. IEEE Trans. Inf. Theory 55, 2, 846--866. Google ScholarDigital Library
S. Zhou, K. Ligett, and L. Wasserman. 2009b. Differential privacy with compression. In Proceedings of the IEEE International Symposium on Information Theory. Google ScholarDigital Library
M. Zinkevich. 2003. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning.Google Scholar

Index Terms

Recommendations

When Machine Learning Meets Privacy: A Survey and Outlook

The newly emerged machine learning (e.g., deep learning) methods have become a strong driving force to revolutionize a wide range of industries, such as smart healthcare, financial technology, and surveillance systems. Meanwhile, privacy has emerged as ...
Read More
Image Features Anonymization for Privacy Aware Machine Learning
Machine Learning, Optimization, and Data Science
Abstract
Data privacy is a major public concern in the digital age, especially image data that provides a large amount of information. The wariness about the use of image data affect the sharing and publication of these data. In this context, Differential ...
Read More
Preserving data privacy in machine learning systems
Abstract
The wide adoption of Machine Learning to solve a large set of real-life problems came with the need to collect and process large volumes of data, some of which are considered personal and sensitive, raising serious concerns about data protection. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Journal of the ACM Volume 61, Issue 6
November 2014
285 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/2700084
Editor:
Victor Vianu
University of California, San Diego
Issue’s Table of Contents
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 December 2014
- Accepted: 1 June 2014
- Revised: 1 October 2013
- Received: 1 October 2012
Published in jacm Volume 61, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Differential privacy
lower bounds
machine learning
minimax convergence rates
saddle points
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 83
  Total Citations
  View Citations
- 1,313
  Total Downloads
- Downloads (Last 12 months)71
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Privacy Aware Learning

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

When Machine Learning Meets Privacy: A Survey and Outlook

Image Features Anonymization for Privacy Aware Machine Learning

Preserving data privacy in machine learning systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Privacy Aware Learning

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

When Machine Learning Meets Privacy: A Survey and Outlook

Image Features Anonymization for Privacy Aware Machine Learning

Preserving data privacy in machine learning systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media