skip to main content
10.1145/3097983.3098029acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

KunPeng: Parameter Server based Distributed Learning Systems and Its Applications in Alibaba and Ant Financial

Published:13 August 2017Publication History

ABSTRACT

In recent years, due to the emergence of Big Data (terabytes or petabytes) and Big Model (tens of billions of parameters), there has been an ever-increasing need of parallelizing machine learning (ML) algorithms in both academia and industry. Although there are some existing distributed computing systems, such as Hadoop and Spark, for parallelizing ML algorithms, they only provide synchronous and coarse-grained operators (e.g., Map, Reduce, and Join, etc.), which may hinder developers from implementing more efficient algorithms. This motivated us to design a universal distributed platform termed KunPeng, that combines both distributed systems and parallel optimization algorithms to deal with the complexities that arise from large-scale ML. Specifically, KunPeng not only encapsulates the characteristics of data/model parallelism, load balancing, model sync-up, sparse representation, industrial fault-tolerance, etc., but also provides easy-to-use interface to empower users to focus on the core ML logics. Empirical results on terabytes of real datasets with billions of samples and features demonstrate that, such a design brings compelling performance improvements on ML programs ranging from Follow-the-Regularized-Leader Proximal algorithm to Sparse Logistic Regression and Multiple Additive Regression Trees. Furthermore, KunPeng's encouraging performance is also shown for several real-world applications including the Alibaba's Double 11 Online Shopping Festival and Ant Financial's transaction risk estimation.

Skip Supplemental Material Section

Supplemental Material

li_distributed_learning.mp4

mp4

420.7 MB

References

  1. Rami Al-Rfou and others 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv preprint arXiv:1605.02688 (2016).Google ScholarGoogle Scholar
  2. Galen Andrew and Jianfeng Gao 2007. Scalable training of L 1-regularized log-linear models Proceedings of the 24th International Conference on Machine Learning. ACM, 33--40.Google ScholarGoogle Scholar
  3. Badri Bhaskar and Erik Ordentlich 2016. Scaling Machine Learning To Billions of Parameters. Spark Summit (2016).Google ScholarGoogle Scholar
  4. Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning, Vol. 11, 23--581 (2010), 81.Google ScholarGoogle Scholar
  5. Tianqi Chen and Carlos Guestrin 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785--794. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Thomas H Cormen and Michael T Goodrich 1996. A bridging model for parallel computation, communication, and I/O. Comput. Surveys Vol. 28, 4es (1996), 208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, and others 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems. 1223--1231.Google ScholarGoogle Scholar
  8. Jeffrey Dean and Sanjay Ghemawat 2008. MapReduce: simplified data processing on large clusters. Commun. ACM Vol. 51, 1 (2008), 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. John Duchi, Elad Hazan, and Yoram Singer 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research Vol. 12, Jul (2011), 2121--2159.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.Google ScholarGoogle Scholar
  11. Joseph E Gonzalez, Reynold S Xin, Ankur Dave, Daniel Crankshaw, Michael J Franklin, and Ion Stoica 2014. GraphX: Graph Processing in a Distributed Dataflow Framework OSDI, Vol. Vol. 14. 599--613.Google ScholarGoogle Scholar
  12. Xinran He and others. 2014. Practical lessons from predicting clicks on ads at facebook Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. ACM, 1--9.Google ScholarGoogle Scholar
  13. Qirong Ho and Others. 2013. More effective distributed ml via a stale synchronous parallel parameter server Advances in Neural Information Processing Systems. 1223--1231.Google ScholarGoogle Scholar
  14. Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2333--2338.Google ScholarGoogle Scholar
  15. Richard E Korf. 2009. Multi-Way Number Partitioning. In Proceedings of the Twenty-first International Joint Conference on Artificial Intelligence. Citeseer, 538--543.Google ScholarGoogle Scholar
  16. Mu Li and others. 2014natexlabb. Scaling Distributed Machine Learning with the Parameter Server Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, Vol. Vol. 14. 583--598.Google ScholarGoogle Scholar
  17. Mu Li, David G Andersen, Alexander J Smola, and Kai Yu. 2014 a. Communication efficient distributed machine learning with the parameter server NIPS. 19--27.Google ScholarGoogle Scholar
  18. Mu Li, Ziqi Liu, Alexander J Smola, and Yu-Xiang Wang. 2016. DiFacto: Distributed factorization machines. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, 377--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein 2012. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment Vol. 5, 8 (2012), 716--727. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H Brendan McMahan and others 2013. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1222--1230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Steffen Rendle. 2012. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology, Vol. 3, 3 (2012), 57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Steffen Rendle and Lars Schmidt-Thieme 2010. Pairwise interaction tensor factorization for personalized tag recommendation Proceedings of the Third ACM International Conference on Web Search and Data Mining. ACM, 81--90.Google ScholarGoogle Scholar
  23. RohitShetty. 2011. hot-standby. https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_availability_cold_warm_hot?lang=en. (2011). Accessed Feb 12, 2017.Google ScholarGoogle Scholar
  24. Alexander Smola and Shravan Narayanamurthy 2010. An architecture for parallel topic models. Proceedings of the VLDB Endowment Vol. 3, 1--2 (2010), 703--710.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Vinod Kumar Vavilapalli and others 2013. Apache hadoop yarn: Yet another resource negotiator Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 5.Google ScholarGoogle Scholar
  26. Tom White. 2012. Hadoop-The Definitive Guide: Storage and Analysis at Internet Scale (revised and updated). (2012).Google ScholarGoogle Scholar
  27. Eric P Xing and others 2015. Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data Vol. 1, 2 (2015), 49--67.Google ScholarGoogle ScholarCross RefCross Ref
  28. Yuan Yu and others. 2008. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, Vol. Vol. 8. 1--14.Google ScholarGoogle Scholar
  29. Jinhui Yuan, Fei Gao, Qirong Ho, Wei Dai, Jinliang Wei, Xun Zheng, Eric Po Xing, Tie-Yan Liu, and Wei-Ying Ma 2015. LightLDA: Big topic models on modest computer clusters Proceedings of the 24th International Conference on World Wide Web. ACM, 1351--1361.Google ScholarGoogle Scholar
  30. Hyokun Yun, Hsiang-Fu Yu, Cho-Jui Hsieh, SVN Vishwanathan, and Inderjit Dhillon 2014. NOMAD: Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion. Proceedings of the VLDB Endowment Vol. 7, 11 (2014), 975--986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. He Yunlong, Sun Yongjie, Liu Lantao, and Hao Ruixiang. 2016. DistML. https://github.com/intel-machine-learning/DistML. (2016). Accessed Feb 12, 2017.Google ScholarGoogle Scholar
  32. Matei Zaharia and others 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 2--2.Google ScholarGoogle Scholar
  33. Zhuo Zhang, Chao Li, Yangyu Tao, Renyu Yang, Hong Tang, and Jie Xu. 2014. Fuxi: a fault-tolerant resource management and job scheduling system at internet scale. Proceedings of the VLDB Endowment Vol. 7, 13 (2014), 1393--1404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jun Zhou, Qing Cui, Xiaolong Li, Peilin Zhao, Shenquan Qu, and Jun Huang. 2017. PSMART: Parameter Server based Multiple Additive Regression Trees System. Accepted Proceedings of the 26th International Conference on World Wide Web. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. KunPeng: Parameter Server based Distributed Learning Systems and Its Applications in Alibaba and Ant Financial

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
        August 2017
        2240 pages
        ISBN:9781450348874
        DOI:10.1145/3097983

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 August 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader