research-article

Large-scale matrix factorization with distributed stochastic gradient descent

Authors:
Rainer Gemulla

Max-Planck-Institut für Informatik, Saarbrücken, Germany

Max-Planck-Institut für Informatik, Saarbrücken, Germany
View Profile

,
Erik Nijkamp

IBM Almaden Research Center, San Jose, CA, USA

IBM Almaden Research Center, San Jose, CA, USA
View Profile

,
Peter J. Haas

IBM Almaden Research Center, San Jose, CA, USA

IBM Almaden Research Center, San Jose, CA, USA
View Profile

,
Yannis Sismanis

IBM Almaden Research Center, San Jose, CA, USA

IBM Almaden Research Center, San Jose, CA, USA
View Profile

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2011Pages 69–77https://doi.org/10.1145/2020408.2020426

Published:21 August 2011Publication History

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 69–77

ABSTRACT

We provide a novel algorithm to approximately factor large matrices with millions of rows, millions of columns, and billions of nonzero elements. Our approach rests on stochastic gradient descent (SGD), an iterative stochastic optimization algorithm. We first develop a novel "stratified" SGD variant (SSGD) that applies to general loss-minimization problems in which the loss function can be expressed as a weighted sum of "stratum losses." We establish sufficient conditions for convergence of SSGD using results from stochastic approximation theory and regenerative process theory. We then specialize SSGD to obtain a new matrix-factorization algorithm, called DSGD, that can be fully distributed and run on web-scale datasets using, e.g., MapReduce. DSGD can handle a wide variety of matrix factorizations. We describe the practical techniques used to optimize performance in our DSGD implementation. Experiments suggest that DSGD converges significantly faster and has better scalability properties than alternative algorithms.

References

Apache Hadoop. https://hadoop.apache.org.Google Scholar
S. Asmussen. Applied Probability and Queues. Springer, 2nd edition, 2003.Google Scholar
J. Bennett and S. Lanning. The Netflix prize. In KDD Cup and Workshop, 2007.Google Scholar
C. M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, 2007. Google ScholarDigital Library
L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In NIPS, volume 20, pages 161--168. 2008.Google ScholarDigital Library
R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput., 16(5):1190--1208, 1995. Google ScholarDigital Library
Y. S. Chow and H. Teicher. Probability Theory: Independence, Interchangeability, Martingales. Springer, 2nd edition, 1988.Google Scholar
A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In WWW, pages 271--280, 2007. Google ScholarDigital Library
S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla, P. J. Haas, and J. McPherson. Ricardo: Integrating R and Hadoop. In SIGMOD, pages 987--998, 2010. Google ScholarDigital Library
R. Gemulla, P. J. Haas, E. Nijkamp, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. Technical Report RJ10481, IBM Almaden Research Center, San Jose, CA, 2011. Available at www.almaden.ibm.com/cs/people/peterh/dsgdTechRep.pdf.Google ScholarDigital Library
K. B. Hall, S. Gilpin, and G. Mann. MapReduce/Bigtable for distributed optimization. In NIPS LCCC Workshop, 2010.Google Scholar
T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999. Google ScholarDigital Library
Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8):30--37, 2009. Google ScholarDigital Library
H. J. Kushner and G. Yin. Stochastic Approximation and Recursive Algorithms and Applications. Springer, 2nd edition, 2003.Google Scholar
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788--791, 1999.Google ScholarCross Ref
C. Liu, H.-c. Yang, J. Fan, L.-W. He, and Y.-M. Wang. Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In WWW, pages 681--690, 2010. Google ScholarDigital Library
G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. In NIPC, pages 1231--1239. 2009.Google Scholar
R. McDonald, K. Hall, and G. Mann. Distributed training strategies for the structured perceptron. In HLT, pages 456--464, 2010. Google ScholarDigital Library
A. P. Singh and G. J. Gordon. A unified view of matrix factorization models. In ECML PKDD, pages 358--373, 2008. Google ScholarDigital Library
Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the Netflix Prize. In AAIM, pages 337--348, 2008. Google ScholarDigital Library
M. A. Zinkevich, M. Weimer, A. J. Smola, and L. Li. Parallelized stochastic gradient descent. In NIPS, pages 2595--2603, 2010.Google ScholarDigital Library

Index Terms

Large-scale matrix factorization with distributed stochastic gradient descent
1. Mathematics of computing
  1. Mathematical software

Recommendations

Shared-memory and shared-nothing stochastic gradient descent algorithms for matrix completion

We provide parallel algorithms for large-scale matrix completion on problems with millions of rows, millions of columns, and billions of revealed entries. We focus on in-memory algorithms that run either in a shared-memory environment on a powerful ...
Read More
Stochastic gradient descent possibilistic clustering
SETN 2020: 11th Hellenic Conference on Artificial Intelligence

Although online versions of several well known clustering algorithms have been proposed, in order to deal effectively with the big data issue, as well as with the case where the data are available in a streaming fashion, very few of them follow the ...
Read More
Asynchronous parallel stochastic gradient descent: a numeric core for scalable distributed machine learning algorithms
MLHPC '15: Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments

The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in terms of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2011
1446 pages
ISBN:9781450308137
DOI:10.1145/2020408
General Chair:
Chid Apte
IBM Research
,
Program Chairs:
Joydeep Ghosh
UT Austin
,
Padhraic Smyth
UC Irvine
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 August 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distributed matrix factorization
mapreduce
recommendation system
stochastic gradient descent
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 393
  Total Citations
  View Citations
- 4,061
  Total Downloads
- Downloads (Last 12 months)153
- Downloads (Last 6 weeks)27
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Large-scale matrix factorization with distributed stochastic gradient descent

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Shared-memory and shared-nothing stochastic gradient descent algorithms for matrix completion

Stochastic gradient descent possibilistic clustering

Asynchronous parallel stochastic gradient descent: a numeric core for scalable distributed machine learning algorithms