research-article

SkippyNN: An Embedded Stochastic-Computing Accelerator for Convolutional Neural Networks

Authors:
Reza Hojabr

School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran

School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
View Profile

,
Kamyar Givaki

School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran

School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
View Profile

,
S. M. Reza Tayaranian

School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran

School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
View Profile

,
Parsa Esfahanian

School of Computer Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran

School of Computer Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
View Profile

,
Ahmad Khonsari

School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran and School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, LA, USA

School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran and School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, LA, USA
View Profile

,
Dara Rahmati

School of Computer Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran

School of Computer Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
View Profile

,
M. Hassan Najafi

School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, LA, USA

School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, LA, USA
View Profile

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019June 2019Article No.: 132Pages 1–6https://doi.org/10.1145/3316781.3317911

Published:02 June 2019Publication History

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

Pages 1–6

ABSTRACT

Employing convolutional neural networks (CNNs) in embedded devices seeks novel low-cost and energy efficient CNN accelerators. Stochastic computing (SC) is a promising low-cost alternative to conventional binary implementations of CNNs. Despite the low-cost advantage, SC-based arithmetic units suffer from prohibitive execution time due to processing long bit-streams. In particular, multiplication as the main operation in convolution computation, is an extremely time-consuming operation which hampers employing SC methods in designing embedded CNNs.

In this work, we propose a novel architecture, called SkippyNN, that reduces the computation time of SC-based multiplications in the convolutional layers of CNNs. Each convolution in a CNN is composed of numerous multiplications where each input value is multiplied by a weight vector. Producing the result of the first multiplication, the following multiplications can be performed by multiplying the input and the differences of the successive weights. Leveraging this property, we develop a differential Multiply-and-Accumulate unit, called DMAC, to reduce the time consumed by convolutions in SkippyNN. We evaluate the efficiency of SkippyNN using four modern CNNs. On average, SkippyNN ofers 1.2x speedup and 2.7x energy saving compared to the binary implementation of CNN accelerators.

References

V. Akhlaghi, A. Yazdanbakhsh, K. Samadi, H. Esmaeilzadeh, and R. Gupta. 2018. Snapea: Predictive early activation for reducing computation in deep convolutional neural networks. ISCA. Google ScholarDigital Library
A. Alaghi and J. P Hayes. 2013. Survey of stochastic computing. ACM Transactions on Embedded computing systems (TECS) 12, 2s (2013), 92. Google ScholarDigital Library
A. Alaghi, W. Qian, and J. P Hayes. 2018. The promise and challenge of stochastic computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 8 (2018), 1515--1531.Google ScholarCross Ref
M. Alawad and M. Lin. 2016. Stochastic-based deep convolutional networks with reconfigurable logic fabric. IEEE Transactions on multi-scale computing systems 4 (2016), 242--256.Google ScholarCross Ref
T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Notices 49, 4 (2014), 269--284. Google ScholarDigital Library
Y. Chen, T. Krishna, J. Emer, and V. Sze. 2016. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. In IEEE International Solid-State Circuits Conference, ISSCC 2016. 262--263.Google Scholar
S R. Faraji, M H. Najafi, B. Li, K. Bazargan, and D. J Lilja. 2019. Energy-Efficient Convolutional Neural Networks with Deterministic Bit-Stream Processing. In Design, Automation, and Test in Europe (DATE).Google Scholar
R. Hojabr, M. Modarressi, M. Daneshtalab, A. Yasoubi, and A. Khonsari. 2017. Customizing Clos Network-on-Chip for Neural Networks. IEEE Trans. Comput. 66, 11 (2017), 1865--1877.Google ScholarDigital Library
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. 675--678. Google ScholarDigital Library
P. Judd, J. Albericio, T. Hetherington, T. M Aamodt, and A. Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE. Google ScholarDigital Library
V. T Lee, A. Alaghi, J. P Hayes, V. Sathe, and L. Ceze. 2017. Energy-efficient hybrid stochastic-binary neural networks for near-sensor computing. In 2017 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 13--18. Google ScholarDigital Library
B. Li, M. H. Najafi, and D. J. Lilja. 2016. Using Stochastic Computing to Reduce the Hardware Requirements for a Restricted Boltzmann Machine Classifier. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '16). ACM, New York, NY, USA, 36--41. Google ScholarDigital Library
B. Li, M. H. Najafi, and D. J. Lilja. 2019. Low-Cost Stochastic Hybrid Multiplier for Quantized Neural Networks. J. Emerg. Technol. Comput. Syst. 15, 2, Article 18 (March 2019), 18:1--18:19 pages. Google ScholarDigital Library
Ji Li, Ao Ren, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, and Yanzhi Wang. 2017. Towards acceleration of deep convolutional neural networks using stochastic computing.. In ASP-DAC. 115--120.Google Scholar
P. Li, D. J Lilja, W. Qian, K. Bazargan, and M. D Riedel. 2014. Computation on Stochastic Bit Streams Digital Image Processing Case Studies. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 3 (2014), 449--462. Google ScholarDigital Library
Y. Liu, Y. Wang, F. Lombardi, and J. Han. 2018. An energy-efficient stochastic computational deep belief network. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018. IEEE, 1175--1178.Google Scholar
M. H. Najafi, D. J. Lilja, M. D. Riedel, and K. Bazargan. 2018. Low-Cost Sorting Network Circuits Using Unary Processing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 8 (Aug 2018), 1471--1480.Google ScholarCross Ref
A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W Keckler, and W. J Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Computer Architecture (ISCA), 2017 ACM/IEEE 44th Annual International Symposium on. IEEE, 27--40. Google ScholarDigital Library
A. Ren, Z. Li, C. Ding, Q. Qiu, Y. Wang, J. Li, X. Qian, and B. Yuan. 2017. Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing. ACM SIGOPS Operating Systems Review 51, 2 (2017). Google ScholarDigital Library
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252. Google ScholarDigital Library
H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, V. Chandra, and H. Esmaeilzadeh. 2018. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE. Google ScholarDigital Library
H. Sim, S. Kenzhegulov, and J. Lee. 2018. DPS: dynamic precision scaling for stochastic computing-based deep neural networks. In Proceedings of the 55th Annual Design Automation Conference. ACM, 13. Google ScholarDigital Library
H. Sim and J. Lee. 2017. A new stochastic computing multiplier with application to deep convolutional neural networks. In Design Automation Conference (DAC), 2017 54th ACM/EDAC/IEEE. IEEE, 1--6. Google ScholarDigital Library
A. Yasoubi, R. Hojabr, and M. Modarressi. 2017. Power-efficient accelerator design for neural networks using computation reuse. IEEE Computer Architecture Letters 16, 1 (2017), 72--75.Google ScholarDigital Library
S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1--12. Google ScholarDigital Library
S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).Google Scholar

SkippyNN: An Embedded Stochastic-Computing Accelerator for Convolutional Neural Networks
1. Hardware
  1. Integrated circuits
    1. Logic circuits

Recommendations

Optimized reversible binary-coded decimal adders

Babu and Chowdhury [H.M.H. Babu, A.R. Chowdhury, Design of a compact reversible binary coded decimal adder circuit, Journal of Systems Architecture 52 (5) (2006) 272-282] recently proposed, in this journal, a reversible adder for binary-coded decimals. ...
Read More
Lossless-constraint Denoising based Auto-encoders

In this paper, we address the poor generalization ability problem of traditional auto-encoder on noise data, and propose a Lossless-constraint Denoising (LD) method, which can enhance the anti-noise ability and robustness of auto-encoders. We ...
Read More
Undecimated wavelet shrinkage estimate of the 1D and 2D spectra
ICASSP '00: Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04

We study the problem of estimating the log-spectrum of a stationary Gaussian time series by thresholding the wavelet coefficients. We propose the use of the undecimated wavelet transform to denoise the log-periodogram. For this, we review a denoising ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019
June 2019
1378 pages
ISBN:9781450367257
DOI:10.1145/3316781

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 June 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,770of5,499submissions,32%
Upcoming Conference
DAC '24

Sponsor:

sigda

61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

San Francisco , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 447
  Total Downloads
- Downloads (Last 12 months)50
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SkippyNN: An Embedded Stochastic-Computing Accelerator for Convolutional Neural Networks

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

ABSTRACT

References

Cited By

Recommendations

Optimized reversible binary-coded decimal adders

Lossless-constraint Denoising based Auto-encoders

Undecimated wavelet shrinkage estimate of the 1D and 2D spectra

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SkippyNN: An Embedded Stochastic-Computing Accelerator for Convolutional Neural Networks

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

ABSTRACT

References

Cited By

Recommendations

Optimized reversible binary-coded decimal adders

Lossless-constraint Denoising based Auto-encoders

Undecimated wavelet shrinkage estimate of the 1D and 2D spectra

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media