ABSTRACT
The rapid growth of data has raised the importance of privacy-preserving techniques in distributed machine learning. In this paper, we develop a privacy-preserving method to a class of regularized empirical risk minimization (ERM) machine learning problems. We first decentralize the learning algorithm using the alternating direction method of multipliers (ADMM), and propose the method of dual variable perturbation to provide dynamic differential privacy. The mechanism leads to a privacy-preserving algorithm under mild conditions of the convexity and differentiability of the loss function and the regularizer. We study the performance of the algorithm measured by the number of data points required to achieve a bounded error. To design an optimal privacy mechanism, we analyze the fundamental tradeoff between privacy and accuracy, and provide guidelines to choose privacy parameters. Numerical experiments using the real-world database are performed to corroborate the results on the privacy and utility tradeoffs and design.
- A. Asuncion and D. Newman. Uci machine learning repository, 2007.Google Scholar
- A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: the sulq framework. In Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 128--138. ACM, 2005. Google ScholarDigital Library
- K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. Differentially private empirical risk minimization. The Journal of Machine Learning Research, 12:1069--1109, 2011. Google ScholarDigital Library
- P. Chilstrom. Singular value inequalities: New approaches to conjectures. 2013.Google Scholar
- M. Collins. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pages 1--8. Association for Computational Linguistics, 2002. Google ScholarDigital Library
- Y.-H. Dai. A perfect example for the bfgs method. Mathematical Programming, 138(1--2):501--530, 2013.Google Scholar
- J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarDigital Library
- C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography, pages 265--284. Springer, 2006. Google ScholarDigital Library
- A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of association rules. Information Systems, 29(4):343--364, 2004. Google ScholarDigital Library
- P. A. Forero, A. Cano, and G. B. Giannakis. Consensus-based distributed support vector machines. The Journal of Machine Learning Research, 11:1663--1707, 2010. Google ScholarDigital Library
- S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? SIAM Journal on Computing, 40(3):793--826, 2011. Google ScholarDigital Library
- J. Kim and W. Winkler. Multiplicative noise for masking continuous data. Statistics, page 01, 2003.Google Scholar
- M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su. Scaling distributed machine learning with the parameter server. In Proc. OSDI, pages 583--598, 2014. Google ScholarDigital Library
- R. McDonald, K. Hall, and G. Mann. Distributed training strategies for the structured perceptron. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 456--464. Association for Computational Linguistics, 2010. Google ScholarDigital Library
- F. McSherry and K. Talwar. Mechanism design via differential privacy. In Foundations of Computer Science, 2007. FOCS'07. 48th Annual IEEE Symposium on, pages 94--103. IEEE, 2007. Google ScholarDigital Library
- A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In Security and Privacy, 2008. SP 2008. IEEE Symposium on, pages 111--125. IEEE, 2008. Google ScholarDigital Library
- K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 75--84. ACM, 2007. Google ScholarDigital Library
- S. Shalev-Shwartz and N. Srebro. Svm optimization: inverse dependence on training set size. In Proceedings of the 25th international conference on Machine learning, pages 928--935. ACM, 2008. Google ScholarDigital Library
- L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134--1142, 1984. Google ScholarDigital Library
Index Terms
- A Dual Perturbation Approach for Differential Private ADMM-Based Distributed Empirical Risk Minimization
Recommendations
Towards Plausible Differentially Private ADMM Based Distributed Machine Learning
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementThe Alternating Direction Method of Multipliers (ADMM) and its distributed version have been widely used in machine learning. In the iterations of ADMM, model updates using local private data and model exchanges among agents impose critical privacy ...
Differentially Private Distributed Optimization
ICDCN '15: Proceedings of the 16th International Conference on Distributed Computing and NetworkingIn distributed optimization and iterative consensus literature, a standard problem is for N agents to minimize a function f over a subset of Euclidean space, where the cost function is expressed as a sum Σ fi. In this paper, we study the private ...
Rényi Differentially Private ADMM for Non-Smooth Regularized Optimization
CODASPY '20: Proceedings of the Tenth ACM Conference on Data and Application Security and PrivacyIn this paper we consider the problem of minimizing composite objective functions consisting of a convex differentiable loss function plus a non-smooth regularization term, such as $L_1$ norm or nuclear norm, under Rényi differential privacy (RDP). To ...
Comments