research-article

On-Demand Deep Model Compression for Mobile Devices: A Usage-Driven Model Selection Framework

Authors:
Sicong Liu

Xidian University

Xidian University
View Profile

,
Yingyan Lin

Rice University

Rice University
View Profile

,
Zimu Zhou

ETH Zurich

ETH Zurich
View Profile

,
Kaiming Nan

Xidian University

Xidian University
View Profile

,
Hui Liu

Xidian University

Xidian University
View Profile

,
Junzhao Du

Xidian University

Xidian University
View Profile

MobiSys '18: Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and ServicesJune 2018Pages 389–400https://doi.org/10.1145/3210240.3210337

Published:10 June 2018Publication History

MobiSys '18: Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services

Pages 389–400

ABSTRACT

Recent research has demonstrated the potential of deploying deep neural networks (DNNs) on resource-constrained mobile platforms by trimming down the network complexity using different compression techniques. The current practice only investigate stand-alone compression schemes even though each compression technique may be well suited only for certain types of DNN layers. Also, these compression techniques are optimized merely for the inference accuracy of DNNs, without explicitly considering other application-driven system performance (e.g. latency and energy cost) and the varying resource availabilities across platforms (e.g. storage and processing capability). In this paper, we explore the desirable tradeoff between performance and resource constraints by user-specified needs, from a holistic system-level viewpoint. Specifically, we develop a usage-driven selection framework, referred to as AdaDeep, to automatically select a combination of compression techniques for a given DNN, that will lead to an optimal balance between user-specified performance goals and resource constraints. With an extensive evaluation on five public datasets and across twelve mobile devices, experimental results show that AdaDeep enables up to 9.8x latency reduction, 4.3x energy efficiency improvement, and 38x storage reduction in DNNs while incurring negligible accuracy loss. AdaDeep also uncovers multiple effective combinations of compression techniques unexplored in existing literature.

Supplemental Material

p389-liu.webm

webm

88.7 MB

Download

References

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained Policy Optimization. Proceedings of ICML (2017).Google Scholar
Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2015. Deep compositional question answering with neural module networks. arXiv preprint arXiv:1511.02799 2 (2015).Google Scholar
Irwan Bello, Hieu Pham, Quoc V Le, Mohammad Norouzi, and Samy Bengio. 2017. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940 (2017).Google Scholar
James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research (2012). Google ScholarDigital Library
James S Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization. In Proceedings of NIPS. Google ScholarDigital Library
Sourav Bhattacharya and Nicholas D Lane. 2016. Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Proceedings of SenSys. Google ScholarDigital Library
Soravit Changpinyo, Mark Sandler, and Andrey Zhmoginov. 2017. The power of sparsity in convolutional neural networks. arXiv preprint arXiv:1702.06257 (2017).Google Scholar
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of ISCA. Google ScholarDigital Library
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of CVPR.Google ScholarCross Ref
Coline Devin, Abhishek Gupta, Trevor Darrell, Pieter Abbeel, and Sergey Levine. 2017. Learning modular neural network policies for multi-task and multi-robot transfer. In Proceedings of ICRA.Google ScholarCross Ref
Tobias Domhan, Jost Tobias Springenberg, and Frank Hutter. 2015. Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves.. In Proceedings of IJCAI. Google ScholarDigital Library
Biyi Fang, Jillian Co, and Mi Zhang. 2017. DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation. In Proceedings of SenSys. 5:1--5:13. Google ScholarDigital Library
Petko Georgiev, Nicholas D Lane, Kiran K Rachuri, and Cecilia Mascolo. 2016. LEO: Scheduling sensor inference algorithms across heterogeneous mobile processors and network resources. In Proceedings of MobiCom. Google ScholarDigital Library
Google. 2017. TensorFlow. (2017). https://goo.gl/j7HAZJ.Google Scholar
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In Proceedings of ISCA. Google ScholarDigital Library
Song Han, Huizi Mao, and William J Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In Proceedings of ICLR.Google Scholar
Seungyeop Han, Haichen Shen, Matthai Philipose, Sharad Agarwal, Alec Wolman, and Arvind Krishnamurthy. 2016. MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints. In Proceedings of MobiSys. Google ScholarDigital Library
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
Loc N Huynh, Rajesh Krishna Balan, and Youngki Lee. 2017. DeepMon: Building Mobile GPU Deep Learning Models for Continuous Vision Applications. In Proceedings of MobiSys. 186--186. Google ScholarDigital Library
Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).Google Scholar
Kazufumi Ito and Karl Kunisch. 2008. Lagrange multiplier approach to variational problems and applications. SIAM. Google ScholarDigital Library
Diederik Kingma and Jimmy Ba. 2015. Adam:A method for stochastic optimization. In Proceedings of ICLR.Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of NIPS. Google ScholarDigital Library
Alex Krizhevsky, Nair Vinod, and Hinton Geoffrey. 2014. The CIFAR-10 dataset. https://goo.gl/hXmru5. (2014).Google Scholar
Nicholas D Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. 2016. Deepx: A software accelerator for low-power deep learning inference on mobile devices. In Proceedings of IPSN. Google ScholarDigital Library
Nicholas D Lane and Petko Georgiev. 2015. Can deep learning revolutionize mobile sensing?. In Proceedings of HotMobile. ACM. Google ScholarDigital Library
Nicholas D Lane, Petko Georgiev, and Lorena Qendro. 2015. DeepEar: robust smartphone audio sensing in unconstrained acoustic environments using deep learning. In Proceedings of UbiComp. 283--294. Google ScholarDigital Library
Yann LeCun. 1998. The MNIST database of handwritten digits. https://goo.gl/t6gTEy. (1998).Google Scholar
Yan LeCun. 2017. LeNet. (2017). https://goo.gl/APBzd5.Google Scholar
Zhenjiang Li, Mo Li, Prasant Mohapatra, Jinsong Han, and Shuaiyu Chen. 2017. iType: Using eye gaze to enhance typing privacy. In Proceedings of INFOCOM.Google ScholarCross Ref
Min Lin, Qiang Chen, and Shuicheng Yan. 2014. Network in network. In Proceedings of ICLR.Google Scholar
Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Pensky. 2015. Sparse convolutional neural networks. In Proceedings of CVPR.Google Scholar
Lanlan Liu and Jia Deng. 2018. Dynamic Deep Neural Networks: Optimizing Accuracy-Efficiency Trade-offs by Selective Execution. In Proceedings of AAAI.Google Scholar
Yang Liu and Zhenjiang Li. 2018. iType: Using eye gaze to enhance typing privacy. In Proceedings of INFOCOM.Google Scholar
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. In Proceedings of NIPS Workshops.Google Scholar
Liu Sicong, Zhou Zimu, Du Junzhao, Shangguan Longfei, Jun Han, and Xin Wang. 2017. UbiEar: Bringing Location-independent Sound Awareness to the Hard-of-hearing People with Smartphones. Journal of IMWUT (2017). Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of ICLR.Google Scholar
Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. In Proceedings of NIPS. Google ScholarDigital Library
Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Mr Prabhat, and Ryan Adams. 2015. Scalable bayesian optimization using deep neural networks. In Proceedings of ICML. 2171--2180. Google ScholarDigital Library
Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, and Frank Hutter. 2016. Bayesian optimization with robust Bayesian neural networks. In Proceedings of NIPS. Google ScholarDigital Library
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. arXiv preprint arXiv:1703.09039 (2017).Google Scholar
UCI. 2017. Dataset for Human Activity Recognition. https://goo.gl/m5bRo1. (2017).Google Scholar
Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep Reinforcement Learning with Double Q-Learning.. In Proceedings of AAAI. Google ScholarDigital Library
Harm Van Seijen, Hado Van Hasselt, Shimon Whiteson, and Marco Wiering. 2009. A theoretical and empirical analysis of Expected Sarsa. In Proceedings of ADPRL.Google ScholarCross Ref
Stylianos I Venieris and Christos-Savvas Bouganis. 2017. Latency-driven design for FPGA-based convolutional neural networks. In Proceedings of FPL. 1--8.Google ScholarCross Ref
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot, and Nando De Freitas. 2016. Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581 (2016).Google Scholar
Mengwei Xu, Feng Qian, and Saumay Pushp. 2017. Enabling Cooperative Inference of Deep Learning on Wearables and Smartphones. arXiv preprint arXiv:1712.03073 (2017).Google Scholar
Tien-Ju Yang, Yu-Hsin Chen, and Vivienne Sze. 2017. Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proceedings of CVPR.Google ScholarCross Ref
Xiaolong Zheng, Jiliang Wang, Longfei Shangguan, Zimu Zhou, and Yunhao Liu. 2017. Design and Implementation of a CSI-Based Ubiquitous Smoking Detection System. IEEE/ACM Transactions on Networking 25, 6 (2017), 3781--3793. Google ScholarDigital Library
Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).Google Scholar
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. 2017. Learning Transferable Architectures for Scalable Image Recognition. arXiv preprint arXiv:1707.07012 (2017).Google Scholar

Index Terms

On-Demand Deep Model Compression for Mobile Devices: A Usage-Driven Model Selection Framework
1. Human-centered computing
  1. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing systems and tools

Recommendations

Deep Model Compression via Two-Stage Deep Reinforcement Learning
Machine Learning and Knowledge Discovery in Databases. Research Track
Abstract
Besides accuracy, the model size of convolutional neural networks (CNN) models is another important factor considering limited hardware resources in practical applications. For example, employing deep neural networks on mobile systems requires the ...
Read More
Perturbation of deep autoencoder weights for model compression and classification of tabular data
Abstract
Fully connected deep neural networks (DNN) often include redundant weights leading to overfitting and high memory requirements. Additionally, in tabular data classification, DNNs are challenged by the often superior performance of ...
Read More
Mobile Sensing Through Deep Learning
Ph.D. Forum '17: Proceedings of the 2017 Workshop on MobiSys 2017 Ph.D. Forum

Today, mobile devices are equipped with powerful processors along with various on-device sensors. Over the past few years, deep learning has become the dominant approach in the field of machine learning due to its impressive performance. We envision ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

MobiSys '18: Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services
June 2018
560 pages
ISBN:9781450357203
DOI:10.1145/3210240

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
deep reinforcement learning
model compression
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate274of1,679submissions,16%
Upcoming Conference
MOBISYS '24

Sponsor:

sigmobile

The 22nd Annual International Conference on Mobile Systems, Applications and Services

June 3 - 7, 2024

Minato-ku, Tokyo , Japan
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 139
  Total Citations
  View Citations
- 1,541
  Total Downloads
- Downloads (Last 12 months)194
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On-Demand Deep Model Compression for Mobile Devices: A Usage-Driven Model Selection Framework

MobiSys '18: Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Deep Model Compression via Two-Stage Deep Reinforcement Learning

Perturbation of deep autoencoder weights for model compression and classification of tabular data

Mobile Sensing Through Deep Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

On-Demand Deep Model Compression for Mobile Devices: A Usage-Driven Model Selection Framework

MobiSys '18: Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Deep Model Compression via Two-Stage Deep Reinforcement Learning

Perturbation of deep autoencoder weights for model compression and classification of tabular data

Mobile Sensing Through Deep Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media