Abstract
Recently, deep learning techniques have enjoyed success in various multimedia applications, such as image classification and multimodal data analysis. Large deep learning models are developed for learning rich representations of complex data. There are two challenges to overcome before deep learning can be widely adopted in multimedia and other applications. One is usability, namely the implementation of different models and training algorithms must be done by nonexperts without much effort, especially when the model is large and complex. The other is scalability, namely the deep learning system must be able to provision for a huge demand of computing resources for training large models with massive datasets. To address these two challenges, in this article we design a distributed deep learning platform called SINGA, which has an intuitive programming model based on the common layer abstraction of deep learning models. Good scalability is achieved through flexible distributed training architecture and specific optimization techniques. SINGA runs on both GPUs and CPUs, and we show that it outperforms many other state-of-the-art deep learning systems. Our experience with developing and training deep learning models for real-life multimedia applications in SINGA shows that the platform is both usable and scalable.
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado et al. 2015. TensorFlow: Large-scale machine learning on heterogeneous systems. arXiv:1603.04467. http://tensorflow.org/.Google Scholar
- Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian J. Goodfellow, Arnaud Bergeron, Nicolas Bouchard, and Yoshua Bengio. 2012. Theano: New features and speed improvements. In Proceedings of the Deep Learning Workshop (NIPS’12).Google Scholar
- Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016b. Training deep nets with sublinear memory cost. arXiv:1604.06174. http://arxiv.org/abs/1604.06174Google Scholar
- Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274.Google Scholar
- Jianmin Chen, Rajat Monga, Samy Bengio, and Rafal Józefowicz. 2016a. Revisiting distributed synchronous SGD. arXiv:1604.00981. http://arxiv.org/abs/1604.00981Google Scholar
- Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. 2014. Project Adam: Building an efficient and scalable deep learning training system. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). 571--582. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/chilimbi. Google ScholarDigital Library
- Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yan-Tao. Zheng. 2009. NUS-WIDE: A real-world Web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR’09). Article No. 48. Google ScholarDigital Library
- Dan Claudiu Ciresan, Ueli Meier, Luca Maria Gambardella, and Jürgen Schmidhuber. 2010. Deep big simple neural nets excel on handwritten digit recognition. arXiv:1003.0358.Google Scholar
- Adam Coates, Brody Huval, Tao Wang, David J. Wu, Bryan C. Catanzaro, and Andrew Y. Ng. 2013. Deep learning with COTS HPC systems. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 1337--1345.Google ScholarDigital Library
- R. Collobert, K. Kavukcuoglu, and C. Farabet. 2011. Torch7: A Matlab-like environment for machine learning. In Proceedings of the BigLearn Workshop (NIPS’11).Google Scholar
- Wei Dai, Jinliang Wei, Xun Zheng, Jin Kyu Kim, Seunghak Lee, Junming Yin, Qirong Ho, and Eric P. Xing. 2013. Petuum: A framework for iterative-convergent distributed ML. arXiv:1312.7651. http://arxiv.org/abs/1312.7651Google Scholar
- Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew W. Senior, Paul A. Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems (NIPS’12). 1232--1240. Google ScholarDigital Library
- John C. Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121--2159. http://dl.acm.org/citation.cfm?id=2021068 Google ScholarDigital Library
- Fangxiang Feng, Xiaojie Wang, and Ruifan Li. 2014. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). 7--16. DOI:http://dx.doi.org/10.1145/2647868.2654902 Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv:1512.03385.Google Scholar
- Geoffrey Hinton and Ruslan Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786, 504--507.Google Scholar
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093.Google Scholar
- Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Sai Wu. 2014. epiC: An extensible and scalable system for processing big data. Proceedings of the VLDB Endowment 7, 7, 541--552. http://www.vldb.org/pvldb/vol7/p541-jiang.pdf. Google ScholarDigital Library
- Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997.Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS’12). 1106--1114. Google ScholarDigital Library
- Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Greg Corrado, Kai Chen, Jeffrey Dean, and Andrew Y. Ng. 2012. Building high-level features using large scale unsupervised learning. In Proceedings of the International Conference on Machine Learning (ICML’12).Google Scholar
- Yann LeCun, Léon Bottou, Genevieve B. Orr, and Klaus-Robert Müller. 1996. Efficient BackProp. In Neural Networks: Tricks of the Trade. Springer, 9--50. DOI:http://dx.doi.org/10.1007/3-540-49430-8_2Google ScholarDigital Library
- Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling distributed machine learning with the parameter server. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 583--598. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/li_mu. Google ScholarDigital Library
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (NIPS’13). 3111--3119. Google ScholarDigital Library
- Tomas Mikolov, Stefan Kombrink, Lukás Burget, Jan Cernocký, and Sanjeev Khudanpur. 2011. Extensions of recurrent neural network language model. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11). IEEE, Los Alamitos, CA, 5528--5531. DOI:http://dx.doi.org/10.1109/ICASSP.2011.5947611Google ScholarCross Ref
- Beng Chin Ooi, Kian-Lee Tan, Sheng Wang, Wei Wang, Qingchao Cai, Gang Chen, Jinyang Gao et al. 2015. SINGA: A distributed deep learning platform. In Proceedings of the ACM Multimedia Conference. Google ScholarDigital Library
- Thomas Paine, Hailin Jin, Jianchao Yang, Zhe Lin, and Thomas S. Huang. 2013. GPU asynchronous stochastic gradient descent to speed up neural network training. arXiv:1312.6186.Google Scholar
- Benjamin Recht, Christopher Re, Stephen J. Wright, and Feng Niu. 2011. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems (NIPS’11). 693--701. Google ScholarDigital Library
- Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. In Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH’14). 1058--1062.Google ScholarCross Ref
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. http://arxiv.org/abs/1409.1556Google Scholar
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014. Going deeper with convolutions. arXiv:1409.4842.Google Scholar
- Heng Tao Shen, Beng Chin Ooi, and Kian-Lee Tan. 2000. Giving meanings to WWW images. In Proceedings of the ACM Multimedia Conference. 39--47. Google ScholarDigital Library
- Kian-Lee Tan, Qingchao Cai, Beng Chin Ooi, Weng-Fai Wong, Chang Yao, and Hao Zhang. 2015. In-memory databases: Challenges and opportunities from software and hardware perspectives. ACM SIGMOD Record 44, 2, 35--40. Google ScholarDigital Library
- Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep learning for content-based image retrieval: A comprehensive study. In Proceedings of the ACM Multimedia Conference. 157--166. Google ScholarDigital Library
- Xinxi Wang and Ye Wang. 2014. Improving content-based and hybrid music recommendation using deep learning. In Proceedings of the ACM Multimedia Conference. 627--636. DOI:http://dx.doi.org/10.1145/2647868.2654940 Google ScholarDigital Library
- Wei Wang, Beng Chin Ooi, Xiaoyan Yang, Dongxiang Zhang, and Yueting Zhuang. 2014. Effective multi-modal retrieval based on stacked auto-encoders. Proceedings of the VLDB Endowment 7, 8, 649--660. Google ScholarDigital Library
- Wei Wang, Gang Chen, Tien Tuan Anh Dinh, Jinyang Gao, Beng Chin Ooi, Kian-Lee Tan, and Sheng Wang. 2015a. SINGA: Putting deep learning in the hands of multimedia users. In Proceedings of the ACM Multimedia Conference. Google ScholarDigital Library
- Wei Wang, Xiaoyan Yang, Beng Chin Ooi, Dongxiang Zhang, and Yueting Zhuang. 2015b. Effective deep learning-based multi-modal retrieval. VLDB Journal 25, 1, 79--101. DOI:http://dx.doi.org/10.1007/s00778-015-0391-4 Google ScholarDigital Library
- Ren Wu, Shengen Yan, Yi Shan, Qingqing Dang, and Gang Sun. 2015. Deep Image: Scaling up image recognition. arXiv:1501.02876. http://arxiv.org/abs/1501.02876Google Scholar
- Zuxuan Wu, Yu-Gang Jiang, Jun Wang, Jian Pu, and Xiangyang Xue. 2014. Exploring inter-feature and inter-class relationships with deep neural networks for video classification. In Proceedings of the ACM Multimedia Conference. 167--176. Google ScholarDigital Library
- Omry Yadan, Keith Adams, Yaniv Taigman, and Marc’Aurelio Ranzato. 2013. Multi-GPU training of ConvNets. arXiv:1312.5853.Google Scholar
- Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2015. Joint visual-textual sentiment analysis with deep neural networks. In Proceedings of the ACM Multimedia Conference. 1071--1074. DOI:http://dx.doi.org/10.1145/2733373.2806284. Google ScholarDigital Library
- Dong Yu, Adam Eversole, Mike Seltzer, Kaisheng Yao, Oleksii Kuchaiev, Yu Zhang, Frank Seide et al. 2014. An Introduction to Computational Networks and the Computational Network Toolkit. Microsoft Technical Report MSR-TR-2014-112. Microsoft Research.Google Scholar
- Ce Zhang and Christopher Re. 2014. DimmWitted: A study of main-memory statistical analytics. Proceedings of the VLDB Endowment 7, 12, 1283--1294. http://www.vldb.org/pvldb/vol7/p1283-zhang.pdf. Google ScholarDigital Library
- Hanwang Zhang, Yang Yang, Huan-Bo Luan, Shuicheng Yang, and Tat-Seng Chua. 2014. Start from scratch: Towards automatically identifying, modeling, and naming visual attributes. In Proceedings of the ACM Multimedia Conference. 187--196. Google ScholarDigital Library
Index Terms
- Deep Learning at Scale and at Ease
Recommendations
SINGA: Putting Deep Learning in the Hands of Multimedia Users
MM '15: Proceedings of the 23rd ACM international conference on MultimediaRecently, deep learning techniques have enjoyed success in various multimedia applications, such as image classification and multi-modal data analysis. Two key factors behind deep learning's remarkable achievement are the immense computing power and the ...
SINGA: A Distributed Deep Learning Platform
MM '15: Proceedings of the 23rd ACM international conference on MultimediaDeep learning has shown outstanding performance in various machine learning tasks. However, the deep complex model structure and massive training data make it expensive to train. In this paper, we present a distributed deep learning system, called SINGA, ...
S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingAvailability of large data sets like ImageNet and massively parallel computation support in modern HPC devices like NVIDIA GPUs have fueled a renewed interest in Deep Learning (DL) algorithms. This has triggered the development of DL frameworks like ...
Comments