research-article

KunPeng: Parameter Server based Distributed Learning Systems and Its Applications in Alibaba and Ant Financial

Authors:
Jun Zhou

Ant Financial Services Group, Hangzhou, China

Ant Financial Services Group, Hangzhou, China
View Profile

,
Xiaolong Li

Ant Financial Services Group, Hangzhou, China

Ant Financial Services Group, Hangzhou, China
View Profile

,
Peilin Zhao

Ant Financial Services Group, Hangzhou, China

Ant Financial Services Group, Hangzhou, China
View Profile

,
Chaochao Chen

Ant Financial Services Group, Hangzhou, China

Ant Financial Services Group, Hangzhou, China
View Profile

,
Longfei Li

Ant Financial Services Group, Hangzhou, China

Ant Financial Services Group, Hangzhou, China
View Profile

,
Xinxing Yang

Ant Financial Services Group, Hangzhou, China

Ant Financial Services Group, Hangzhou, China
View Profile

,
Qing Cui

Alibaba Cloud, Hangzhou, China

Alibaba Cloud, Hangzhou, China
View Profile

,
Jin Yu

Alibaba Cloud, Hangzhou, China

Alibaba Cloud, Hangzhou, China
View Profile

,
Xu Chen

Alibaba Cloud, Hangzhou, China

Alibaba Cloud, Hangzhou, China
View Profile

,
Yi Ding

Alibaba Cloud, Hangzhou, China

Alibaba Cloud, Hangzhou, China
View Profile

,
Yuan Alan Qi

Ant Financial Services Group, Hangzhou, China

Ant Financial Services Group, Hangzhou, China
View Profile

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2017Pages 1693–1702https://doi.org/10.1145/3097983.3098029

Published:13 August 2017Publication History

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1693–1702

ABSTRACT

In recent years, due to the emergence of Big Data (terabytes or petabytes) and Big Model (tens of billions of parameters), there has been an ever-increasing need of parallelizing machine learning (ML) algorithms in both academia and industry. Although there are some existing distributed computing systems, such as Hadoop and Spark, for parallelizing ML algorithms, they only provide synchronous and coarse-grained operators (e.g., Map, Reduce, and Join, etc.), which may hinder developers from implementing more efficient algorithms. This motivated us to design a universal distributed platform termed KunPeng, that combines both distributed systems and parallel optimization algorithms to deal with the complexities that arise from large-scale ML. Specifically, KunPeng not only encapsulates the characteristics of data/model parallelism, load balancing, model sync-up, sparse representation, industrial fault-tolerance, etc., but also provides easy-to-use interface to empower users to focus on the core ML logics. Empirical results on terabytes of real datasets with billions of samples and features demonstrate that, such a design brings compelling performance improvements on ML programs ranging from Follow-the-Regularized-Leader Proximal algorithm to Sparse Logistic Regression and Multiple Additive Regression Trees. Furthermore, KunPeng's encouraging performance is also shown for several real-world applications including the Alibaba's Double 11 Online Shopping Festival and Ant Financial's transaction risk estimation.

Supplemental Material

li_distributed_learning.mp4

mp4

420.7 MB

Download

References

Rami Al-Rfou and others 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv preprint arXiv:1605.02688 (2016).Google Scholar
Galen Andrew and Jianfeng Gao 2007. Scalable training of L 1-regularized log-linear models Proceedings of the 24th International Conference on Machine Learning. ACM, 33--40.Google Scholar
Badri Bhaskar and Erik Ordentlich 2016. Scaling Machine Learning To Billions of Parameters. Spark Summit (2016).Google Scholar
Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning, Vol. 11, 23--581 (2010), 81.Google Scholar
Tianqi Chen and Carlos Guestrin 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785--794. Google ScholarDigital Library
Thomas H Cormen and Michael T Goodrich 1996. A bridging model for parallel computation, communication, and I/O. Comput. Surveys Vol. 28, 4es (1996), 208. Google ScholarDigital Library
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, and others 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems. 1223--1231.Google Scholar
Jeffrey Dean and Sanjay Ghemawat 2008. MapReduce: simplified data processing on large clusters. Commun. ACM Vol. 51, 1 (2008), 107--113. Google ScholarDigital Library
John Duchi, Elad Hazan, and Yoram Singer 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research Vol. 12, Jul (2011), 2121--2159.Google ScholarDigital Library
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.Google Scholar
Joseph E Gonzalez, Reynold S Xin, Ankur Dave, Daniel Crankshaw, Michael J Franklin, and Ion Stoica 2014. GraphX: Graph Processing in a Distributed Dataflow Framework OSDI, Vol. Vol. 14. 599--613.Google Scholar
Xinran He and others. 2014. Practical lessons from predicting clicks on ads at facebook Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. ACM, 1--9.Google Scholar
Qirong Ho and Others. 2013. More effective distributed ml via a stale synchronous parallel parameter server Advances in Neural Information Processing Systems. 1223--1231.Google Scholar
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2333--2338.Google Scholar
Richard E Korf. 2009. Multi-Way Number Partitioning. In Proceedings of the Twenty-first International Joint Conference on Artificial Intelligence. Citeseer, 538--543.Google Scholar
Mu Li and others. 2014natexlabb. Scaling Distributed Machine Learning with the Parameter Server Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, Vol. Vol. 14. 583--598.Google Scholar
Mu Li, David G Andersen, Alexander J Smola, and Kai Yu. 2014 a. Communication efficient distributed machine learning with the parameter server NIPS. 19--27.Google Scholar
Mu Li, Ziqi Liu, Alexander J Smola, and Yu-Xiang Wang. 2016. DiFacto: Distributed factorization machines. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, 377--386. Google ScholarDigital Library
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein 2012. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment Vol. 5, 8 (2012), 716--727. Google ScholarDigital Library
H Brendan McMahan and others 2013. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1222--1230. Google ScholarDigital Library
Steffen Rendle. 2012. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology, Vol. 3, 3 (2012), 57. Google ScholarDigital Library
Steffen Rendle and Lars Schmidt-Thieme 2010. Pairwise interaction tensor factorization for personalized tag recommendation Proceedings of the Third ACM International Conference on Web Search and Data Mining. ACM, 81--90.Google Scholar
RohitShetty. 2011. hot-standby. https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_availability_cold_warm_hot?lang=en. (2011). Accessed Feb 12, 2017.Google Scholar
Alexander Smola and Shravan Narayanamurthy 2010. An architecture for parallel topic models. Proceedings of the VLDB Endowment Vol. 3, 1--2 (2010), 703--710.Google ScholarDigital Library
Vinod Kumar Vavilapalli and others 2013. Apache hadoop yarn: Yet another resource negotiator Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 5.Google Scholar
Tom White. 2012. Hadoop-The Definitive Guide: Storage and Analysis at Internet Scale (revised and updated). (2012).Google Scholar
Eric P Xing and others 2015. Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data Vol. 1, 2 (2015), 49--67.Google ScholarCross Ref
Yuan Yu and others. 2008. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, Vol. Vol. 8. 1--14.Google Scholar
Jinhui Yuan, Fei Gao, Qirong Ho, Wei Dai, Jinliang Wei, Xun Zheng, Eric Po Xing, Tie-Yan Liu, and Wei-Ying Ma 2015. LightLDA: Big topic models on modest computer clusters Proceedings of the 24th International Conference on World Wide Web. ACM, 1351--1361.Google Scholar
Hyokun Yun, Hsiang-Fu Yu, Cho-Jui Hsieh, SVN Vishwanathan, and Inderjit Dhillon 2014. NOMAD: Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion. Proceedings of the VLDB Endowment Vol. 7, 11 (2014), 975--986. Google ScholarDigital Library
He Yunlong, Sun Yongjie, Liu Lantao, and Hao Ruixiang. 2016. DistML. https://github.com/intel-machine-learning/DistML. (2016). Accessed Feb 12, 2017.Google Scholar
Matei Zaharia and others 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 2--2.Google Scholar
Zhuo Zhang, Chao Li, Yangyu Tao, Renyu Yang, Hong Tang, and Jie Xu. 2014. Fuxi: a fault-tolerant resource management and job scheduling system at internet scale. Proceedings of the VLDB Endowment Vol. 7, 13 (2014), 1393--1404. Google ScholarDigital Library
Jun Zhou, Qing Cui, Xiaolong Li, Peilin Zhao, Shenquan Qu, and Jun Huang. 2017. PSMART: Parameter Server based Multiple Additive Regression Trees System. Accepted Proceedings of the 26th International Conference on World Wide Web. ACM.Google ScholarDigital Library

Index Terms

KunPeng: Parameter Server based Distributed Learning Systems and Its Applications in Alibaba and Ant Financial
1. Theory of computation
  1. Computational complexity and cryptography
    1. Communication complexity
  2. Design and analysis of algorithms
    1. Distributed algorithms

Recommendations

PS2: Parameter Server on Spark
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

Most of the data is extracted and processed by Spark in Tencent Machine Learning Platform. However, seldom of them use Spark MLlib, an official machine learning (ML) library on top of Spark due to its inefficiency. In contrast, systems like parameter ...
Read More
PCB—A Distributed Computing System in CORBA
Special Issue on Java on Clusters

CORBA (common object request broker architecture) provides a conceptual software bus for distributed object systems, which is not only suitable for enterprise distributed computing, but also suitable for parallel distributed scientific computations. ...
Read More
Sparse Gradient Compression for Distributed SGD
Database Systems for Advanced Applications
Abstract
Communication bandwidth is a bottleneck in distributed machine learning, and limits the system scalability. The transmission of gradients often dominates the communication in distributed SGD. One promising technique is using the gradient ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2017
2240 pages
ISBN:9781450348874
DOI:10.1145/3097983
General Chairs:
Stan Matwin
Dalhousie University
,
Shipeng Yu
LinkedIn
,
Faisal Farooq
IBM
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distributed computing system
distributed optimization algorithm
parameter server
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,012of7,793submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 2,013
  Total Downloads
- Downloads (Last 12 months)45
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

KunPeng: Parameter Server based Distributed Learning Systems and Its Applications in Alibaba and Ant Financial

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

PS2: Parameter Server on Spark

PCB—A Distributed Computing System in CORBA

Sparse Gradient Compression for Distributed SGD

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

KunPeng: Parameter Server based Distributed Learning Systems and Its Applications in Alibaba and Ant Financial

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

PS2: Parameter Server on Spark

PCB—A Distributed Computing System in CORBA

Sparse Gradient Compression for Distributed SGD

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media