Deep Learning at Scale and at Ease

Authors:
Wei Wang

National University of Singapore

National University of Singapore

0000-0001-5367-7056
View Profile

,
Gang Chen

Zhejiang University, Hangzhou, PR China

Zhejiang University, Hangzhou, PR China
View Profile

,
Haibo Chen

NetEase Inc., Zhejiang, China

NetEase Inc., Zhejiang, China
View Profile

,
Tien Tuan Anh Dinh

National University of Singapore

National University of Singapore
View Profile

,
Jinyang Gao

National University of Singapore

National University of Singapore
View Profile

,
Beng Chin Ooi

National University of Singapore

National University of Singapore
View Profile

,
Kian-Lee Tan

National University of Singapore

National University of Singapore
View Profile

,
Sheng Wang

National University of Singapore

National University of Singapore
View Profile

,
Meihui Zhang

Singapore University of Technology and Design

Singapore University of Technology and Design
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 12 Issue 4sArticle No.: 69pp 1–25https://doi.org/10.1145/2996464

Published:02 November 2016Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

Recently, deep learning techniques have enjoyed success in various multimedia applications, such as image classification and multimodal data analysis. Large deep learning models are developed for learning rich representations of complex data. There are two challenges to overcome before deep learning can be widely adopted in multimedia and other applications. One is usability, namely the implementation of different models and training algorithms must be done by nonexperts without much effort, especially when the model is large and complex. The other is scalability, namely the deep learning system must be able to provision for a huge demand of computing resources for training large models with massive datasets. To address these two challenges, in this article we design a distributed deep learning platform called SINGA, which has an intuitive programming model based on the common layer abstraction of deep learning models. Good scalability is achieved through flexible distributed training architecture and specific optimization techniques. SINGA runs on both GPUs and CPUs, and we show that it outperforms many other state-of-the-art deep learning systems. Our experience with developing and training deep learning models for real-life multimedia applications in SINGA shows that the platform is both usable and scalable.

References

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado et al. 2015. TensorFlow: Large-scale machine learning on heterogeneous systems. arXiv:1603.04467. http://tensorflow.org/.Google Scholar
Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian J. Goodfellow, Arnaud Bergeron, Nicolas Bouchard, and Yoshua Bengio. 2012. Theano: New features and speed improvements. In Proceedings of the Deep Learning Workshop (NIPS’12).Google Scholar
Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016b. Training deep nets with sublinear memory cost. arXiv:1604.06174. http://arxiv.org/abs/1604.06174Google Scholar
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274.Google Scholar
Jianmin Chen, Rajat Monga, Samy Bengio, and Rafal Józefowicz. 2016a. Revisiting distributed synchronous SGD. arXiv:1604.00981. http://arxiv.org/abs/1604.00981Google Scholar
Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. 2014. Project Adam: Building an efficient and scalable deep learning training system. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). 571--582. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/chilimbi. Google ScholarDigital Library
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yan-Tao. Zheng. 2009. NUS-WIDE: A real-world Web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR’09). Article No. 48. Google ScholarDigital Library
Dan Claudiu Ciresan, Ueli Meier, Luca Maria Gambardella, and Jürgen Schmidhuber. 2010. Deep big simple neural nets excel on handwritten digit recognition. arXiv:1003.0358.Google Scholar
Adam Coates, Brody Huval, Tao Wang, David J. Wu, Bryan C. Catanzaro, and Andrew Y. Ng. 2013. Deep learning with COTS HPC systems. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 1337--1345.Google ScholarDigital Library
R. Collobert, K. Kavukcuoglu, and C. Farabet. 2011. Torch7: A Matlab-like environment for machine learning. In Proceedings of the BigLearn Workshop (NIPS’11).Google Scholar
Wei Dai, Jinliang Wei, Xun Zheng, Jin Kyu Kim, Seunghak Lee, Junming Yin, Qirong Ho, and Eric P. Xing. 2013. Petuum: A framework for iterative-convergent distributed ML. arXiv:1312.7651. http://arxiv.org/abs/1312.7651Google Scholar
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew W. Senior, Paul A. Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems (NIPS’12). 1232--1240. Google ScholarDigital Library
John C. Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121--2159. http://dl.acm.org/citation.cfm?id=2021068 Google ScholarDigital Library
Fangxiang Feng, Xiaojie Wang, and Ruifan Li. 2014. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). 7--16. DOI:http://dx.doi.org/10.1145/2647868.2654902 Google ScholarDigital Library
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv:1512.03385.Google Scholar
Geoffrey Hinton and Ruslan Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786, 504--507.Google Scholar
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093.Google Scholar
Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Sai Wu. 2014. epiC: An extensible and scalable system for processing big data. Proceedings of the VLDB Endowment 7, 7, 541--552. http://www.vldb.org/pvldb/vol7/p541-jiang.pdf. Google ScholarDigital Library
Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997.Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS’12). 1106--1114. Google ScholarDigital Library
Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Greg Corrado, Kai Chen, Jeffrey Dean, and Andrew Y. Ng. 2012. Building high-level features using large scale unsupervised learning. In Proceedings of the International Conference on Machine Learning (ICML’12).Google Scholar
Yann LeCun, Léon Bottou, Genevieve B. Orr, and Klaus-Robert Müller. 1996. Efficient BackProp. In Neural Networks: Tricks of the Trade. Springer, 9--50. DOI:http://dx.doi.org/10.1007/3-540-49430-8_2Google ScholarDigital Library
Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling distributed machine learning with the parameter server. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 583--598. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/li_mu. Google ScholarDigital Library
Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (NIPS’13). 3111--3119. Google ScholarDigital Library
Tomas Mikolov, Stefan Kombrink, Lukás Burget, Jan Cernocký, and Sanjeev Khudanpur. 2011. Extensions of recurrent neural network language model. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11). IEEE, Los Alamitos, CA, 5528--5531. DOI:http://dx.doi.org/10.1109/ICASSP.2011.5947611Google ScholarCross Ref
Beng Chin Ooi, Kian-Lee Tan, Sheng Wang, Wei Wang, Qingchao Cai, Gang Chen, Jinyang Gao et al. 2015. SINGA: A distributed deep learning platform. In Proceedings of the ACM Multimedia Conference. Google ScholarDigital Library
Thomas Paine, Hailin Jin, Jianchao Yang, Zhe Lin, and Thomas S. Huang. 2013. GPU asynchronous stochastic gradient descent to speed up neural network training. arXiv:1312.6186.Google Scholar
Benjamin Recht, Christopher Re, Stephen J. Wright, and Feng Niu. 2011. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems (NIPS’11). 693--701. Google ScholarDigital Library
Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. In Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH’14). 1058--1062.Google ScholarCross Ref
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. http://arxiv.org/abs/1409.1556Google Scholar
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014. Going deeper with convolutions. arXiv:1409.4842.Google Scholar
Heng Tao Shen, Beng Chin Ooi, and Kian-Lee Tan. 2000. Giving meanings to WWW images. In Proceedings of the ACM Multimedia Conference. 39--47. Google ScholarDigital Library
Kian-Lee Tan, Qingchao Cai, Beng Chin Ooi, Weng-Fai Wong, Chang Yao, and Hao Zhang. 2015. In-memory databases: Challenges and opportunities from software and hardware perspectives. ACM SIGMOD Record 44, 2, 35--40. Google ScholarDigital Library
Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep learning for content-based image retrieval: A comprehensive study. In Proceedings of the ACM Multimedia Conference. 157--166. Google ScholarDigital Library
Xinxi Wang and Ye Wang. 2014. Improving content-based and hybrid music recommendation using deep learning. In Proceedings of the ACM Multimedia Conference. 627--636. DOI:http://dx.doi.org/10.1145/2647868.2654940 Google ScholarDigital Library
Wei Wang, Beng Chin Ooi, Xiaoyan Yang, Dongxiang Zhang, and Yueting Zhuang. 2014. Effective multi-modal retrieval based on stacked auto-encoders. Proceedings of the VLDB Endowment 7, 8, 649--660. Google ScholarDigital Library
Wei Wang, Gang Chen, Tien Tuan Anh Dinh, Jinyang Gao, Beng Chin Ooi, Kian-Lee Tan, and Sheng Wang. 2015a. SINGA: Putting deep learning in the hands of multimedia users. In Proceedings of the ACM Multimedia Conference. Google ScholarDigital Library
Wei Wang, Xiaoyan Yang, Beng Chin Ooi, Dongxiang Zhang, and Yueting Zhuang. 2015b. Effective deep learning-based multi-modal retrieval. VLDB Journal 25, 1, 79--101. DOI:http://dx.doi.org/10.1007/s00778-015-0391-4 Google ScholarDigital Library
Ren Wu, Shengen Yan, Yi Shan, Qingqing Dang, and Gang Sun. 2015. Deep Image: Scaling up image recognition. arXiv:1501.02876. http://arxiv.org/abs/1501.02876Google Scholar
Zuxuan Wu, Yu-Gang Jiang, Jun Wang, Jian Pu, and Xiangyang Xue. 2014. Exploring inter-feature and inter-class relationships with deep neural networks for video classification. In Proceedings of the ACM Multimedia Conference. 167--176. Google ScholarDigital Library
Omry Yadan, Keith Adams, Yaniv Taigman, and Marc’Aurelio Ranzato. 2013. Multi-GPU training of ConvNets. arXiv:1312.5853.Google Scholar
Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2015. Joint visual-textual sentiment analysis with deep neural networks. In Proceedings of the ACM Multimedia Conference. 1071--1074. DOI:http://dx.doi.org/10.1145/2733373.2806284. Google ScholarDigital Library
Dong Yu, Adam Eversole, Mike Seltzer, Kaisheng Yao, Oleksii Kuchaiev, Yu Zhang, Frank Seide et al. 2014. An Introduction to Computational Networks and the Computational Network Toolkit. Microsoft Technical Report MSR-TR-2014-112. Microsoft Research.Google Scholar
Ce Zhang and Christopher Re. 2014. DimmWitted: A study of main-memory statistical analytics. Proceedings of the VLDB Endowment 7, 12, 1283--1294. http://www.vldb.org/pvldb/vol7/p1283-zhang.pdf. Google ScholarDigital Library
Hanwang Zhang, Yang Yang, Huan-Bo Luan, Shuicheng Yang, and Tat-Seng Chua. 2014. Start from scratch: Towards automatically identifying, modeling, and naming visual attributes. In Proceedings of the ACM Multimedia Conference. 187--196. Google ScholarDigital Library

Index Terms

Deep Learning at Scale and at Ease

Recommendations

SINGA: Putting Deep Learning in the Hands of Multimedia Users
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Recently, deep learning techniques have enjoyed success in various multimedia applications, such as image classification and multi-modal data analysis. Two key factors behind deep learning's remarkable achievement are the immense computing power and the ...
Read More
SINGA: A Distributed Deep Learning Platform
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Deep learning has shown outstanding performance in various machine learning tasks. However, the deep complex model structure and massive training data make it expensive to train. In this paper, we present a distributed deep learning system, called SINGA, ...
Read More
S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Availability of large data sets like ImageNet and massively parallel computation support in modern HPC devices like NVIDIA GPUs have fueled a renewed interest in Deep Learning (DL) algorithms. This has triggered the development of DL frameworks like ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 12, Issue 4s
Special Section on Trust Management for Multimedia Big Data and Special Section on Best Papers of ACM Multimedia 2015
November 2016
242 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/2997658
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Copyright © 2016 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2016
- Accepted: 1 August 2016
- Revised: 1 June 2016
- Received: 1 February 2016
Published in tomm Volume 12, Issue 4s

Check for updates
Author Tags
Multimedia
deep learning
distributed training
Qualifiers
- announcement
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 1,634
  Total Downloads
- Downloads (Last 12 months)95
- Downloads (Last 6 weeks)34
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Deep Learning at Scale and at Ease

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

References

Cited By

Index Terms

Recommendations

SINGA: Putting Deep Learning in the Hands of Multimedia Users

SINGA: A Distributed Deep Learning Platform

S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters