Abstract
Tensor processing units improve performance per watt of neural networks in Google datacenters by roughly 50x.
- Abadi, M. et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint, 2016; https://arxiv.org/abs/1603.04467Google Scholar
- Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., and Moshovos, A. 2016 Cnvlutin: Ineffectual-neuron-free deep neural network computing. In Proceedings of the 43<sup>rd</sup> ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea), IEEE Press, 2016. Google ScholarDigital Library
- Asanović, K. Programmable neurocomputing. In The Handbook of Brain Theory and Neural Networks, Second Edition, M.A. Arbib, Ed. MIT Press, Cambridge, MA, Nov. 2002; https://people.eecs.berkeley.edu/~krste/papers/neurocomputing.pdfGoogle Scholar
- Barroso, L.A. and Hölzle, U. The case for energy-proportional computing. IEEE Computer 40, 12 (Dec. 2007), 33--37. Google ScholarDigital Library
- Barr, J. New G2 Instance Type for Amazon EC2: Up to 16 GPUs. Amazon blog, Sept. 29, 2016; https://aws.amazon.com/about-aws/whats-new/2015/04/introducing-a-new-g2-instance-size-the-g28xlarge/Google Scholar
- Barr, J. New Next-Generation GPU-Powered EC2 Instances (G3). Amazon blog, July 13, 2017; https://aws.amazon.com/blogs/aws/new-next-generation-gpu-powered-ec2-instances-g3/Google Scholar
- Chen, Y., Chen, T., Xu, Z., Sun, N., and Teman, 0. DianNao Family: Energy-efficient hardware accelerators for machine learning. Commun. ACM 59, 11 (Nov. 2016), 105--112. Google ScholarDigital Library
- Chen, Y.H., Emer, J., and Sze, V. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the 43rd ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea), IEEE Press, 2016. Google ScholarDigital Library
- Clark, J. Google turning its lucrative Web search over to AI machines. Bloomberg Technology (0ct. 26, 2015).Google Scholar
- Dally, W. High-performance hardware for machine learning. Invited talk at Cadence ENN Summit (Santa Clara, CA, Feb. 9, 2016); https://ip.cadence.com/uploads/presentations/1000AM_Dally_Cadence_ENN.pdfGoogle Scholar
- Dean, J. Large-Scale Deep Learning with TensorFlow for Building Intelligent Systems. ACM webinar, July 7, 2016; https://www.youtube.com/watch?v=vzoe2G5g-w4Google Scholar
- Hammerstrom, D. A VLSI architecture for high-performance, low-cost, on-chip learning. In Proceedings of the International Joint Conference on Neural Networks (San Diego, CA, June 17--21). IEEE Press, 1990.Google ScholarCross Ref
- Han, S., Pool, J., Tran, J., and Dally, W. Learning both weights and connections for efficient neural networks. In Proceedings of Advances in Neural Information Processing Systems (Montreal Canada, Dec.) MIT Press, Cambridge, MA, 2015. Google ScholarDigital Library
- Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., and Dally, W.J. EIE: Efficient Inference Engine on compressed deep neural network. In Proceedings of the 43<sup>rd</sup> ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea). IEEE Press, 2016. Google ScholarDigital Library
- Huang, J. AI Drives the Rise of Accelerated Computing in Data Centers. Nvidia blog, Apr. 2017; https://blogs.nvidia.com/blog/2017/04/10/ai-drives-rise-accelerated-computing-datacenter/Google Scholar
- He, K., Zhang, X., Ren, S., and Sun, J. Identity mappings in deep residual networks. arXiv preprint, Mar. 16, 2016; https://arxiv.org/abs/1603.05027Google Scholar
- Hennessy, J.L. and Patterson, D.A. Computer Architecture: A Quantitative Approach, Sixth Edition. Elsevier, New York, 2018. Google ScholarDigital Library
- Ienne, P., Cornu, T., and Kuhn, G. Special-purpose digital hardware for neural networks: An architectural survey. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 13, 1 (1996), 5--25. Google ScholarDigital Library
- Jouppi, N. Google Supercharges Machine Learning Tasks with TPU Custom Chip. Google platform blog, May 18, 2016; https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.htmlGoogle Scholar
- Jouppi, N. et al, In-datacenter performance of a tensor processing unit. In Proceedings of the 44<sup>th</sup> International Symposium on Computer Architecture (Toronto, Canada, June 24--28). ACM Press, New York, 2017, 1--12. Google ScholarDigital Library
- Keutzer, K. If I could only design one circuit ... Commun. ACM 59, 11 (Nov. 2016), 104. Google ScholarDigital Library
- Krizhevsky, A., Sutskever, I., and Hinton, G. Imagenet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems (Lake Tahoe, NV). MIT Press, Cambridge, MA, 2012. Google ScholarDigital Library
- Kung, H.T. and Leiserson, C.E. Algorithms for VLSI processor arrays. Chapter in Introduction to VLSI systems by C. Mead and L. Conway. Addison-Wesley, Reading, MA, 1980, 271--292.Google Scholar
- Lange, K.D. Identifying shades of green: The SPECpower benchmarks. IEEE Computer 42, 3 (Mar. 2009), 95--97. Google ScholarDigital Library
- Larabel, M. Google Looks to Open Up StreamExecutor to Make GPGPU Programming Easier. Phoronix, Mar. 10, 2016; https://www.phoronix.com/scan.php?page=news_item&px=Google-StreamExec-ParallelGoogle Scholar
- Metz, C. Microsoft bets its future on a reprogrammable computer chip. Wired (Sept. 25, 2016); https://www.wired.com/2016/09/microsoft-bets-future-chip-reprogram-fly/Google Scholar
- Moore, G.E. No exponential is forever: But 'forever' can be delayed! In Proceedings of the International Solid-State Circuits Conference (San Francisco, CA, Feb. 13). IEEE Press, 2003.Google Scholar
- Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J., Keckler, S.W., and Dally, W.J. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44<sup>th</sup> Annual International Symposium on Computer Architecture (Toronto, 0N, Canada, June 24--28). IEEE Press, 2017, 27--40. Google ScholarDigital Library
- Patterson, D.A. Latency lags bandwidth. Commun. ACM 47, 10 (Oct. 2004), 71--75. Google ScholarDigital Library
- Patterson, D.A. and Ditzel, D.R. The case for the reduced instruction set computer. SIGARCH Computer Architecture News 8, 6 (Sept. 1980), 25--33. Google ScholarDigital Library
- Putnam, A. et al. A reconfigurable fabric for accelerating large-scale datacenter services. Commun. ACM 59, 11 (Nov. 2016), 114--122. Google ScholarDigital Library
- Reagen, B., Whatmough, P., Adolf, R., Rama, S., Lee, H., Lee, S.K., Hernández-Lobato, J.M., Wei, G.Y., and Brooks, D. Minerva: Enabling low-power, highly accurate deep neural network accelerators. In Proceedings of the 43<sup>rd</sup> ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea), IEEE Press 2016. Google ScholarDigital Library
- Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (Sept. 20, 2016).Google ScholarCross Ref
- Smith, J.E. Decoupled access/execute computer architectures. In Proceedings of the 11<sup>th</sup> Annual International Symposium on Computer Architecture (Austin, TX, Apr. 26--29). IEEE Computer Society Press, 1982. Google ScholarDigital Library
- Szegedy, C. et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Boston, MA, June 7--12). IEEE Computer Society Press, 2015.Google ScholarCross Ref
- Venkataramani, S. et al. ScaleDeep: A scalable compute architecture for learning and evaluating deep networks. In Proceedings of the 44<sup>th</sup> Annual International Symposium on Computer Architecture (Toronto, ON, Canada, June 24--28). ACM Press, New York, 2017, 13--26. Google ScholarDigital Library
- Williams, S., Waterman, A., and Patterson, D. Roofline: An insightful visual performance model for multi-core architectures. Commun. ACM 52, 4 (Apr. 2009), 65--76. Google ScholarDigital Library
- Wu, Y. et al. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint, Sept. 26, 2016; arXiv:1609.03144Google Scholar
Index Terms
- A domain-specific architecture for deep neural networks
Recommendations
Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions
AbstractWe propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions. KNNs employ the Kronecker product, which provides an efficient way of ...
Granular neural networks
Fuzzy neural networks (FNNs) and rough neural networks (RNNs) both have been hot research topics in the artificial intelligence in recent years. The former imitates the human brain in dealing with problems, the other takes advantage of rough set theory ...
Comments