skip to main content
research-article
Open Access

A domain-specific architecture for deep neural networks

Published:22 August 2018Publication History
Skip Abstract Section

Abstract

Tensor processing units improve performance per watt of neural networks in Google datacenters by roughly 50x.

References

  1. Abadi, M. et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint, 2016; https://arxiv.org/abs/1603.04467Google ScholarGoogle Scholar
  2. Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., and Moshovos, A. 2016 Cnvlutin: Ineffectual-neuron-free deep neural network computing. In Proceedings of the 43<sup>rd</sup> ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea), IEEE Press, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Asanović, K. Programmable neurocomputing. In The Handbook of Brain Theory and Neural Networks, Second Edition, M.A. Arbib, Ed. MIT Press, Cambridge, MA, Nov. 2002; https://people.eecs.berkeley.edu/~krste/papers/neurocomputing.pdfGoogle ScholarGoogle Scholar
  4. Barroso, L.A. and Hölzle, U. The case for energy-proportional computing. IEEE Computer 40, 12 (Dec. 2007), 33--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Barr, J. New G2 Instance Type for Amazon EC2: Up to 16 GPUs. Amazon blog, Sept. 29, 2016; https://aws.amazon.com/about-aws/whats-new/2015/04/introducing-a-new-g2-instance-size-the-g28xlarge/Google ScholarGoogle Scholar
  6. Barr, J. New Next-Generation GPU-Powered EC2 Instances (G3). Amazon blog, July 13, 2017; https://aws.amazon.com/blogs/aws/new-next-generation-gpu-powered-ec2-instances-g3/Google ScholarGoogle Scholar
  7. Chen, Y., Chen, T., Xu, Z., Sun, N., and Teman, 0. DianNao Family: Energy-efficient hardware accelerators for machine learning. Commun. ACM 59, 11 (Nov. 2016), 105--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chen, Y.H., Emer, J., and Sze, V. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the 43rd ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea), IEEE Press, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Clark, J. Google turning its lucrative Web search over to AI machines. Bloomberg Technology (0ct. 26, 2015).Google ScholarGoogle Scholar
  10. Dally, W. High-performance hardware for machine learning. Invited talk at Cadence ENN Summit (Santa Clara, CA, Feb. 9, 2016); https://ip.cadence.com/uploads/presentations/1000AM_Dally_Cadence_ENN.pdfGoogle ScholarGoogle Scholar
  11. Dean, J. Large-Scale Deep Learning with TensorFlow for Building Intelligent Systems. ACM webinar, July 7, 2016; https://www.youtube.com/watch?v=vzoe2G5g-w4Google ScholarGoogle Scholar
  12. Hammerstrom, D. A VLSI architecture for high-performance, low-cost, on-chip learning. In Proceedings of the International Joint Conference on Neural Networks (San Diego, CA, June 17--21). IEEE Press, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  13. Han, S., Pool, J., Tran, J., and Dally, W. Learning both weights and connections for efficient neural networks. In Proceedings of Advances in Neural Information Processing Systems (Montreal Canada, Dec.) MIT Press, Cambridge, MA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., and Dally, W.J. EIE: Efficient Inference Engine on compressed deep neural network. In Proceedings of the 43<sup>rd</sup> ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea). IEEE Press, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Huang, J. AI Drives the Rise of Accelerated Computing in Data Centers. Nvidia blog, Apr. 2017; https://blogs.nvidia.com/blog/2017/04/10/ai-drives-rise-accelerated-computing-datacenter/Google ScholarGoogle Scholar
  16. He, K., Zhang, X., Ren, S., and Sun, J. Identity mappings in deep residual networks. arXiv preprint, Mar. 16, 2016; https://arxiv.org/abs/1603.05027Google ScholarGoogle Scholar
  17. Hennessy, J.L. and Patterson, D.A. Computer Architecture: A Quantitative Approach, Sixth Edition. Elsevier, New York, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ienne, P., Cornu, T., and Kuhn, G. Special-purpose digital hardware for neural networks: An architectural survey. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 13, 1 (1996), 5--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jouppi, N. Google Supercharges Machine Learning Tasks with TPU Custom Chip. Google platform blog, May 18, 2016; https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.htmlGoogle ScholarGoogle Scholar
  20. Jouppi, N. et al, In-datacenter performance of a tensor processing unit. In Proceedings of the 44<sup>th</sup> International Symposium on Computer Architecture (Toronto, Canada, June 24--28). ACM Press, New York, 2017, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Keutzer, K. If I could only design one circuit ... Commun. ACM 59, 11 (Nov. 2016), 104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Krizhevsky, A., Sutskever, I., and Hinton, G. Imagenet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems (Lake Tahoe, NV). MIT Press, Cambridge, MA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kung, H.T. and Leiserson, C.E. Algorithms for VLSI processor arrays. Chapter in Introduction to VLSI systems by C. Mead and L. Conway. Addison-Wesley, Reading, MA, 1980, 271--292.Google ScholarGoogle Scholar
  24. Lange, K.D. Identifying shades of green: The SPECpower benchmarks. IEEE Computer 42, 3 (Mar. 2009), 95--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Larabel, M. Google Looks to Open Up StreamExecutor to Make GPGPU Programming Easier. Phoronix, Mar. 10, 2016; https://www.phoronix.com/scan.php?page=news_item&px=Google-StreamExec-ParallelGoogle ScholarGoogle Scholar
  26. Metz, C. Microsoft bets its future on a reprogrammable computer chip. Wired (Sept. 25, 2016); https://www.wired.com/2016/09/microsoft-bets-future-chip-reprogram-fly/Google ScholarGoogle Scholar
  27. Moore, G.E. No exponential is forever: But 'forever' can be delayed! In Proceedings of the International Solid-State Circuits Conference (San Francisco, CA, Feb. 13). IEEE Press, 2003.Google ScholarGoogle Scholar
  28. Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J., Keckler, S.W., and Dally, W.J. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44<sup>th</sup> Annual International Symposium on Computer Architecture (Toronto, 0N, Canada, June 24--28). IEEE Press, 2017, 27--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Patterson, D.A. Latency lags bandwidth. Commun. ACM 47, 10 (Oct. 2004), 71--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Patterson, D.A. and Ditzel, D.R. The case for the reduced instruction set computer. SIGARCH Computer Architecture News 8, 6 (Sept. 1980), 25--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Putnam, A. et al. A reconfigurable fabric for accelerating large-scale datacenter services. Commun. ACM 59, 11 (Nov. 2016), 114--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Reagen, B., Whatmough, P., Adolf, R., Rama, S., Lee, H., Lee, S.K., Hernández-Lobato, J.M., Wei, G.Y., and Brooks, D. Minerva: Enabling low-power, highly accurate deep neural network accelerators. In Proceedings of the 43<sup>rd</sup> ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea), IEEE Press 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (Sept. 20, 2016).Google ScholarGoogle ScholarCross RefCross Ref
  34. Smith, J.E. Decoupled access/execute computer architectures. In Proceedings of the 11<sup>th</sup> Annual International Symposium on Computer Architecture (Austin, TX, Apr. 26--29). IEEE Computer Society Press, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Szegedy, C. et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Boston, MA, June 7--12). IEEE Computer Society Press, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  36. Venkataramani, S. et al. ScaleDeep: A scalable compute architecture for learning and evaluating deep networks. In Proceedings of the 44<sup>th</sup> Annual International Symposium on Computer Architecture (Toronto, ON, Canada, June 24--28). ACM Press, New York, 2017, 13--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Williams, S., Waterman, A., and Patterson, D. Roofline: An insightful visual performance model for multi-core architectures. Commun. ACM 52, 4 (Apr. 2009), 65--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wu, Y. et al. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint, Sept. 26, 2016; arXiv:1609.03144Google ScholarGoogle Scholar

Index Terms

  1. A domain-specific architecture for deep neural networks

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Communications of the ACM
            Communications of the ACM  Volume 61, Issue 9
            September 2018
            94 pages
            ISSN:0001-0782
            EISSN:1557-7317
            DOI:10.1145/3271489
            Issue’s Table of Contents

            Copyright © 2018 Owner/Author

            Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 22 August 2018

            Check for updates

            Qualifiers

            • research-article
            • Popular
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format