ABSTRACT
Employing convolutional neural networks (CNNs) in embedded devices seeks novel low-cost and energy efficient CNN accelerators. Stochastic computing (SC) is a promising low-cost alternative to conventional binary implementations of CNNs. Despite the low-cost advantage, SC-based arithmetic units suffer from prohibitive execution time due to processing long bit-streams. In particular, multiplication as the main operation in convolution computation, is an extremely time-consuming operation which hampers employing SC methods in designing embedded CNNs.
In this work, we propose a novel architecture, called SkippyNN, that reduces the computation time of SC-based multiplications in the convolutional layers of CNNs. Each convolution in a CNN is composed of numerous multiplications where each input value is multiplied by a weight vector. Producing the result of the first multiplication, the following multiplications can be performed by multiplying the input and the differences of the successive weights. Leveraging this property, we develop a differential Multiply-and-Accumulate unit, called DMAC, to reduce the time consumed by convolutions in SkippyNN. We evaluate the efficiency of SkippyNN using four modern CNNs. On average, SkippyNN ofers 1.2x speedup and 2.7x energy saving compared to the binary implementation of CNN accelerators.
- V. Akhlaghi, A. Yazdanbakhsh, K. Samadi, H. Esmaeilzadeh, and R. Gupta. 2018. Snapea: Predictive early activation for reducing computation in deep convolutional neural networks. ISCA. Google ScholarDigital Library
- A. Alaghi and J. P Hayes. 2013. Survey of stochastic computing. ACM Transactions on Embedded computing systems (TECS) 12, 2s (2013), 92. Google ScholarDigital Library
- A. Alaghi, W. Qian, and J. P Hayes. 2018. The promise and challenge of stochastic computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 8 (2018), 1515--1531.Google ScholarCross Ref
- M. Alawad and M. Lin. 2016. Stochastic-based deep convolutional networks with reconfigurable logic fabric. IEEE Transactions on multi-scale computing systems 4 (2016), 242--256.Google ScholarCross Ref
- T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Notices 49, 4 (2014), 269--284. Google ScholarDigital Library
- Y. Chen, T. Krishna, J. Emer, and V. Sze. 2016. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. In IEEE International Solid-State Circuits Conference, ISSCC 2016. 262--263.Google Scholar
- S R. Faraji, M H. Najafi, B. Li, K. Bazargan, and D. J Lilja. 2019. Energy-Efficient Convolutional Neural Networks with Deterministic Bit-Stream Processing. In Design, Automation, and Test in Europe (DATE).Google Scholar
- R. Hojabr, M. Modarressi, M. Daneshtalab, A. Yasoubi, and A. Khonsari. 2017. Customizing Clos Network-on-Chip for Neural Networks. IEEE Trans. Comput. 66, 11 (2017), 1865--1877.Google ScholarDigital Library
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. 675--678. Google ScholarDigital Library
- P. Judd, J. Albericio, T. Hetherington, T. M Aamodt, and A. Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE. Google ScholarDigital Library
- V. T Lee, A. Alaghi, J. P Hayes, V. Sathe, and L. Ceze. 2017. Energy-efficient hybrid stochastic-binary neural networks for near-sensor computing. In 2017 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 13--18. Google ScholarDigital Library
- B. Li, M. H. Najafi, and D. J. Lilja. 2016. Using Stochastic Computing to Reduce the Hardware Requirements for a Restricted Boltzmann Machine Classifier. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '16). ACM, New York, NY, USA, 36--41. Google ScholarDigital Library
- B. Li, M. H. Najafi, and D. J. Lilja. 2019. Low-Cost Stochastic Hybrid Multiplier for Quantized Neural Networks. J. Emerg. Technol. Comput. Syst. 15, 2, Article 18 (March 2019), 18:1--18:19 pages. Google ScholarDigital Library
- Ji Li, Ao Ren, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, and Yanzhi Wang. 2017. Towards acceleration of deep convolutional neural networks using stochastic computing.. In ASP-DAC. 115--120.Google Scholar
- P. Li, D. J Lilja, W. Qian, K. Bazargan, and M. D Riedel. 2014. Computation on Stochastic Bit Streams Digital Image Processing Case Studies. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 3 (2014), 449--462. Google ScholarDigital Library
- Y. Liu, Y. Wang, F. Lombardi, and J. Han. 2018. An energy-efficient stochastic computational deep belief network. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018. IEEE, 1175--1178.Google Scholar
- M. H. Najafi, D. J. Lilja, M. D. Riedel, and K. Bazargan. 2018. Low-Cost Sorting Network Circuits Using Unary Processing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 8 (Aug 2018), 1471--1480.Google ScholarCross Ref
- A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W Keckler, and W. J Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Computer Architecture (ISCA), 2017 ACM/IEEE 44th Annual International Symposium on. IEEE, 27--40. Google ScholarDigital Library
- A. Ren, Z. Li, C. Ding, Q. Qiu, Y. Wang, J. Li, X. Qian, and B. Yuan. 2017. Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing. ACM SIGOPS Operating Systems Review 51, 2 (2017). Google ScholarDigital Library
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252. Google ScholarDigital Library
- H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, V. Chandra, and H. Esmaeilzadeh. 2018. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE. Google ScholarDigital Library
- H. Sim, S. Kenzhegulov, and J. Lee. 2018. DPS: dynamic precision scaling for stochastic computing-based deep neural networks. In Proceedings of the 55th Annual Design Automation Conference. ACM, 13. Google ScholarDigital Library
- H. Sim and J. Lee. 2017. A new stochastic computing multiplier with application to deep convolutional neural networks. In Design Automation Conference (DAC), 2017 54th ACM/EDAC/IEEE. IEEE, 1--6. Google ScholarDigital Library
- A. Yasoubi, R. Hojabr, and M. Modarressi. 2017. Power-efficient accelerator design for neural networks using computation reuse. IEEE Computer Architecture Letters 16, 1 (2017), 72--75.Google ScholarDigital Library
- S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1--12. Google ScholarDigital Library
- S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).Google Scholar
- SkippyNN: An Embedded Stochastic-Computing Accelerator for Convolutional Neural Networks
Recommendations
Optimized reversible binary-coded decimal adders
Babu and Chowdhury [H.M.H. Babu, A.R. Chowdhury, Design of a compact reversible binary coded decimal adder circuit, Journal of Systems Architecture 52 (5) (2006) 272-282] recently proposed, in this journal, a reversible adder for binary-coded decimals. ...
Lossless-constraint Denoising based Auto-encoders
In this paper, we address the poor generalization ability problem of traditional auto-encoder on noise data, and propose a Lossless-constraint Denoising (LD) method, which can enhance the anti-noise ability and robustness of auto-encoders. We ...
Undecimated wavelet shrinkage estimate of the 1D and 2D spectra
ICASSP '00: Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04We study the problem of estimating the log-spectrum of a stationary Gaussian time series by thresholding the wavelet coefficients. We propose the use of the undecimated wavelet transform to denoise the log-periodogram. For this, we review a denoising ...
Comments