skip to main content
10.1145/2020408.2020426acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Large-scale matrix factorization with distributed stochastic gradient descent

Published:21 August 2011Publication History

ABSTRACT

We provide a novel algorithm to approximately factor large matrices with millions of rows, millions of columns, and billions of nonzero elements. Our approach rests on stochastic gradient descent (SGD), an iterative stochastic optimization algorithm. We first develop a novel "stratified" SGD variant (SSGD) that applies to general loss-minimization problems in which the loss function can be expressed as a weighted sum of "stratum losses." We establish sufficient conditions for convergence of SSGD using results from stochastic approximation theory and regenerative process theory. We then specialize SSGD to obtain a new matrix-factorization algorithm, called DSGD, that can be fully distributed and run on web-scale datasets using, e.g., MapReduce. DSGD can handle a wide variety of matrix factorizations. We describe the practical techniques used to optimize performance in our DSGD implementation. Experiments suggest that DSGD converges significantly faster and has better scalability properties than alternative algorithms.

References

  1. Apache Hadoop. https://hadoop.apache.org.Google ScholarGoogle Scholar
  2. S. Asmussen. Applied Probability and Queues. Springer, 2nd edition, 2003.Google ScholarGoogle Scholar
  3. J. Bennett and S. Lanning. The Netflix prize. In KDD Cup and Workshop, 2007.Google ScholarGoogle Scholar
  4. C. M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In NIPS, volume 20, pages 161--168. 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput., 16(5):1190--1208, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. S. Chow and H. Teicher. Probability Theory: Independence, Interchangeability, Martingales. Springer, 2nd edition, 1988.Google ScholarGoogle Scholar
  8. A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In WWW, pages 271--280, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla, P. J. Haas, and J. McPherson. Ricardo: Integrating R and Hadoop. In SIGMOD, pages 987--998, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Gemulla, P. J. Haas, E. Nijkamp, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. Technical Report RJ10481, IBM Almaden Research Center, San Jose, CA, 2011. Available at www.almaden.ibm.com/cs/people/peterh/dsgdTechRep.pdf.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. B. Hall, S. Gilpin, and G. Mann. MapReduce/Bigtable for distributed optimization. In NIPS LCCC Workshop, 2010.Google ScholarGoogle Scholar
  12. T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8):30--37, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. J. Kushner and G. Yin. Stochastic Approximation and Recursive Algorithms and Applications. Springer, 2nd edition, 2003.Google ScholarGoogle Scholar
  15. D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788--791, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  16. C. Liu, H.-c. Yang, J. Fan, L.-W. He, and Y.-M. Wang. Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In WWW, pages 681--690, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. In NIPC, pages 1231--1239. 2009.Google ScholarGoogle Scholar
  18. R. McDonald, K. Hall, and G. Mann. Distributed training strategies for the structured perceptron. In HLT, pages 456--464, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. P. Singh and G. J. Gordon. A unified view of matrix factorization models. In ECML PKDD, pages 358--373, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the Netflix Prize. In AAIM, pages 337--348, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. A. Zinkevich, M. Weimer, A. J. Smola, and L. Li. Parallelized stochastic gradient descent. In NIPS, pages 2595--2603, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Large-scale matrix factorization with distributed stochastic gradient descent

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2011
      1446 pages
      ISBN:9781450308137
      DOI:10.1145/2020408

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 August 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader