research-article

Open Access

A domain-specific architecture for deep neural networks

Authors:
Norman P. Jouppi

Google, Mountain View, CA

Google, Mountain View, CA
View Profile

,
Cliff Young

Google, Mountain View, CA

Google, Mountain View, CA
View Profile

,
Nishant Patil

Google, Mountain View, CA

Google, Mountain View, CA
View Profile

,
David Patterson

University of California at Berkeley, Berkeley, CA and Google, Mountain View, CA

University of California at Berkeley, Berkeley, CA and Google, Mountain View, CA
View Profile

Authors Info & Claims

Communications of the ACM Volume 61 Issue 9September 2018pp 50–59https://doi.org/10.1145/3154484

Published:22 August 2018Publication History

Communications of the ACM

Abstract

Tensor processing units improve performance per watt of neural networks in Google datacenters by roughly 50x.

References

Abadi, M. et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint, 2016; https://arxiv.org/abs/1603.04467Google Scholar
Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., and Moshovos, A. 2016 Cnvlutin: Ineffectual-neuron-free deep neural network computing. In Proceedings of the 43rd ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea), IEEE Press, 2016. Google ScholarDigital Library
Asanović, K. Programmable neurocomputing. In The Handbook of Brain Theory and Neural Networks, Second Edition, M.A. Arbib, Ed. MIT Press, Cambridge, MA, Nov. 2002; https://people.eecs.berkeley.edu/~krste/papers/neurocomputing.pdfGoogle Scholar
Barroso, L.A. and Hölzle, U. The case for energy-proportional computing. IEEE Computer 40, 12 (Dec. 2007), 33--37. Google ScholarDigital Library
Barr, J. New G2 Instance Type for Amazon EC2: Up to 16 GPUs. Amazon blog, Sept. 29, 2016; https://aws.amazon.com/about-aws/whats-new/2015/04/introducing-a-new-g2-instance-size-the-g28xlarge/Google Scholar
Barr, J. New Next-Generation GPU-Powered EC2 Instances (G3). Amazon blog, July 13, 2017; https://aws.amazon.com/blogs/aws/new-next-generation-gpu-powered-ec2-instances-g3/Google Scholar
Chen, Y., Chen, T., Xu, Z., Sun, N., and Teman, 0. DianNao Family: Energy-efficient hardware accelerators for machine learning. Commun. ACM 59, 11 (Nov. 2016), 105--112. Google ScholarDigital Library
Chen, Y.H., Emer, J., and Sze, V. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the 43rd ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea), IEEE Press, 2016. Google ScholarDigital Library
Clark, J. Google turning its lucrative Web search over to AI machines. Bloomberg Technology (0ct. 26, 2015).Google Scholar
Dally, W. High-performance hardware for machine learning. Invited talk at Cadence ENN Summit (Santa Clara, CA, Feb. 9, 2016); https://ip.cadence.com/uploads/presentations/1000AM_Dally_Cadence_ENN.pdfGoogle Scholar
Dean, J. Large-Scale Deep Learning with TensorFlow for Building Intelligent Systems. ACM webinar, July 7, 2016; https://www.youtube.com/watch?v=vzoe2G5g-w4Google Scholar
Hammerstrom, D. A VLSI architecture for high-performance, low-cost, on-chip learning. In Proceedings of the International Joint Conference on Neural Networks (San Diego, CA, June 17--21). IEEE Press, 1990.Google ScholarCross Ref
Han, S., Pool, J., Tran, J., and Dally, W. Learning both weights and connections for efficient neural networks. In Proceedings of Advances in Neural Information Processing Systems (Montreal Canada, Dec.) MIT Press, Cambridge, MA, 2015. Google ScholarDigital Library
Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., and Dally, W.J. EIE: Efficient Inference Engine on compressed deep neural network. In Proceedings of the 43rd ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea). IEEE Press, 2016. Google ScholarDigital Library
Huang, J. AI Drives the Rise of Accelerated Computing in Data Centers. Nvidia blog, Apr. 2017; https://blogs.nvidia.com/blog/2017/04/10/ai-drives-rise-accelerated-computing-datacenter/Google Scholar
He, K., Zhang, X., Ren, S., and Sun, J. Identity mappings in deep residual networks. arXiv preprint, Mar. 16, 2016; https://arxiv.org/abs/1603.05027Google Scholar
Hennessy, J.L. and Patterson, D.A. Computer Architecture: A Quantitative Approach, Sixth Edition. Elsevier, New York, 2018. Google ScholarDigital Library
Ienne, P., Cornu, T., and Kuhn, G. Special-purpose digital hardware for neural networks: An architectural survey. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 13, 1 (1996), 5--25. Google ScholarDigital Library
Jouppi, N. Google Supercharges Machine Learning Tasks with TPU Custom Chip. Google platform blog, May 18, 2016; https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.htmlGoogle Scholar
Jouppi, N. et al, In-datacenter performance of a tensor processing unit. In Proceedings of the 44th International Symposium on Computer Architecture (Toronto, Canada, June 24--28). ACM Press, New York, 2017, 1--12. Google ScholarDigital Library
Keutzer, K. If I could only design one circuit ... Commun. ACM 59, 11 (Nov. 2016), 104. Google ScholarDigital Library
Krizhevsky, A., Sutskever, I., and Hinton, G. Imagenet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems (Lake Tahoe, NV). MIT Press, Cambridge, MA, 2012. Google ScholarDigital Library
Kung, H.T. and Leiserson, C.E. Algorithms for VLSI processor arrays. Chapter in Introduction to VLSI systems by C. Mead and L. Conway. Addison-Wesley, Reading, MA, 1980, 271--292.Google Scholar
Lange, K.D. Identifying shades of green: The SPECpower benchmarks. IEEE Computer 42, 3 (Mar. 2009), 95--97. Google ScholarDigital Library
Larabel, M. Google Looks to Open Up StreamExecutor to Make GPGPU Programming Easier. Phoronix, Mar. 10, 2016; https://www.phoronix.com/scan.php?page=news_item&px=Google-StreamExec-ParallelGoogle Scholar
Metz, C. Microsoft bets its future on a reprogrammable computer chip. Wired (Sept. 25, 2016); https://www.wired.com/2016/09/microsoft-bets-future-chip-reprogram-fly/Google Scholar
Moore, G.E. No exponential is forever: But 'forever' can be delayed! In Proceedings of the International Solid-State Circuits Conference (San Francisco, CA, Feb. 13). IEEE Press, 2003.Google Scholar
Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J., Keckler, S.W., and Dally, W.J. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (Toronto, 0N, Canada, June 24--28). IEEE Press, 2017, 27--40. Google ScholarDigital Library
Patterson, D.A. Latency lags bandwidth. Commun. ACM 47, 10 (Oct. 2004), 71--75. Google ScholarDigital Library
Patterson, D.A. and Ditzel, D.R. The case for the reduced instruction set computer. SIGARCH Computer Architecture News 8, 6 (Sept. 1980), 25--33. Google ScholarDigital Library
Putnam, A. et al. A reconfigurable fabric for accelerating large-scale datacenter services. Commun. ACM 59, 11 (Nov. 2016), 114--122. Google ScholarDigital Library
Reagen, B., Whatmough, P., Adolf, R., Rama, S., Lee, H., Lee, S.K., Hernández-Lobato, J.M., Wei, G.Y., and Brooks, D. Minerva: Enabling low-power, highly accurate deep neural network accelerators. In Proceedings of the 43rd ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea), IEEE Press 2016. Google ScholarDigital Library
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (Sept. 20, 2016).Google ScholarCross Ref
Smith, J.E. Decoupled access/execute computer architectures. In Proceedings of the 11th Annual International Symposium on Computer Architecture (Austin, TX, Apr. 26--29). IEEE Computer Society Press, 1982. Google ScholarDigital Library
Szegedy, C. et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Boston, MA, June 7--12). IEEE Computer Society Press, 2015.Google ScholarCross Ref
Venkataramani, S. et al. ScaleDeep: A scalable compute architecture for learning and evaluating deep networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (Toronto, ON, Canada, June 24--28). ACM Press, New York, 2017, 13--26. Google ScholarDigital Library
Williams, S., Waterman, A., and Patterson, D. Roofline: An insightful visual performance model for multi-core architectures. Commun. ACM 52, 4 (Apr. 2009), 65--76. Google ScholarDigital Library
Wu, Y. et al. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint, Sept. 26, 2016; arXiv:1609.03144Google Scholar

Index Terms

A domain-specific architecture for deep neural networks

Recommendations

Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions
Abstract
We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions. KNNs employ the Kronecker product, which provides an efficient way of ...
Read More
Neural Networks and Deep Learning: A Textbook
Read More
Granular neural networks

Fuzzy neural networks (FNNs) and rough neural networks (RNNs) both have been hot research topics in the artificial intelligence in recent years. The former imitates the human brain in dealing with problems, the other takes advantage of rough set theory ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 61, Issue 9
September 2018
94 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3271489
Editor:
Andrew A. Chien
Association for Computing Machinery, New York, NY
Issue’s Table of Contents
Copyright © 2018 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 August 2018
Check for updates
Qualifiers
- research-article
- Popular
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 108
  Total Citations
  View Citations
- 36,628
  Total Downloads
- Downloads (Last 12 months)2,585
- Downloads (Last 6 weeks)175
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A domain-specific architecture for deep neural networks

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions

Neural Networks and Deep Learning: A Textbook

Granular neural networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

A domain-specific architecture for deep neural networks

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions

Neural Networks and Deep Learning: A Textbook

Granular neural networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media