skip to main content
research-article
Free Access

Roofline: an insightful visual performance model for multicore architectures

Published:01 April 2009Publication History
Skip Abstract Section

Abstract

The Roofline model offers insight on how to improve the performance of software and hardware.

Skip Supplemental Material Section

Supplemental Material

References

  1. Adve, V. Analyzing the Behavior and Performance of Parallel Programs, Ph.D. thesis, University of Wisconsin, 1993; www.cs.wisc.edu/techreports/1993/TR1201.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AMD. Software Optimization Guide for AMD Family 10h Processors, Publication 40546, Apr. 2008; www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/40546.pdf.Google ScholarGoogle Scholar
  3. Amdahl, G. Validity of the single processor approach to achieving large-scale computing capabilities. In Proceedings of the AFIPS Conference, 1967, 483--485. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Asanovic, K., Bodik, R., Catanzaro, B., Gebis, J., Keutzer, K., Patterson, D., Plishker, W., Shalf, J., Williams, S., and Yelick, K. The Landscape of Parallel Computing Research: A View from Berkeley Technical Report UCB/EECS-2006-183. EECS, University of California, Berkeley, Dec. 2006.Google ScholarGoogle Scholar
  5. Bienia, C., Kumar, S., Singh, J., and Li, K. The PARSEC Benchmark Suite: Characterization and Architectural Implications, Technical Report TR-811-008. Princeton University, Jan. 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bird, S., Waterman, A., Klues, K., Datta, K., Liu, R., Nishtala, R., Williams, S., Asanovi, K., Demmel, J., Patterson, D., and Yelick, K. A case for sensible performance counters. Submitted to the First USENIX Workshop on Hot Topics in Parallelism (Berkeley CA, Mar. 30--31, 2009); www.usenix.org/events/hotpar09/.Google ScholarGoogle Scholar
  7. Boyd, E., Azeem, W., Lee, H., Shih, T., Hung, S., and Davidson, E. A hierarchical approach to modeling and improving the performance of scientific applications on the KSR1. In Proceedings of the 1994 International Conference on Parallel Processing, 1994, 188--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Callahan, D., Cocke, J., and Kennedy, K. Estimating interlock and improving balance for pipelined machines. Journal of Parallel Distributed Computing 5(1988), 334--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Carr, S. and Kennedy, K. Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems 16, 4 (Nov. 1994). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chong, J. Private communication on financial PDE solvers, 2008.Google ScholarGoogle Scholar
  11. Colella, P. Defining Software Requirements for Scientific Computing, Presentation, 2004.Google ScholarGoogle Scholar
  12. Datta, K., Murphy, M., Volkov, V., Williams, S., Carter J., Oliker, L., Patterson, D., Shalf, J., and Yelick, K. Stencil computation optimization and autotuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE SC08 Conference (Austin, TX, Nov. 15--21). IEEE Press, Piscataway, NJ, 2008, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Demmel, J., Dongarra, J., Eijkhout, V., Fuentes, E., Petitet, A., Vuduc, R., Whaley, R., and Yelick, K. Self-adapting linear algebra algorithms and software. Proceedings of the IEEE: Special Issue on Program Generation, Optimization, and Adaptation 93, 2 (2005).Google ScholarGoogle ScholarCross RefCross Ref
  14. Dubois, M. and Briggs, F.A. Performance of synchronized iterative processes in multiprocessor systems. IEEE Transactions on Software Engineering SE-8, 4 (July 1982), 419--431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Frigo, M. and Johnson, S. The design and implementation of FFTW3. Proceedings of the IEEE: Special Issue on Program Generation, Optimization, and Platform Adaptation 93, 2 (2005).Google ScholarGoogle Scholar
  16. Harris, M. Mapping computational concepts to GPUs. In ACM SIGGRAPH Courses, Chapter 31 (Los Angeles, July 31-Aug. 4). ACM Press, New York, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hennessy, J. and Patterson, D. Computer Architecture: A Quantitative Approach, Fourth Edition, Morgan Kaufmann Publishers, Boston, MA. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hill, M. and Marty, M. Amdahl's Law in the multicore era. IEEE Computer (July 2008), 33--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hill, M. and Smith, A. Evaluating associativity in CPU caches. IEEE Transactions on Computers 38, 12 (Dec. 1989), 1612--1630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lazowska, E., Zahorjan, J., Graham, S., and Sevcik, K. Quantitative System Performance: Computer System Analysis Using Queueing Network Models, Prentice Hall, Upper Saddle River, NJ, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Little, J.D.C. A proof of the queueing formula L = λ W. Operations Research 9, 3 (1961), 383--387.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. McCalpin, J. STREAM: Sustainable Memory Bandwidth in High-Performance Computers, 1995; www.cs.virginia.edu/stream.Google ScholarGoogle Scholar
  23. Patterson, D. Latency lags bandwidth. Commun. ACM 47,10 (Oct. 2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Thomasian, A. and Bay, P. Analytic queueing network models for parallel processing of task systems. IEEE Transactions on Computers C-35, 12 (Dec. 1986), 1045--1054. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tikir, M., Carrington, L., Strohmaier, E., and Snavely, A. A genetic algorithms approach to modeling the performance of memory-bound computations. In Proceedings of the SC07 Conference (Reno, NV, Nov. 10--16). ACM Press, New York, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Vuduc, R., Demmel, J., Yelick, K., Kamil, S., Nishtala, R., and Lee, B. Performance optimizations and bounds for sparse matrix-vector multiply. In Proceedings of the ACM/IEEESC02 Conference (Baltimore, MD, Nov. 16--22). IEEE Computer Society Press, Los Alamitos, CA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Williams, S. Autotuning Performance on Multicore Computers, Ph.D. Thesis. University of California, Berkeley, Dec. 2008; www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-164.html. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Williams, S., Carter, J., Oliker, L., Shalf, J., and Yelick, K. Lattice Boltzmann simulation optimization on leading multicore platforms. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing Symposium (Miami, FL, Apr. 14--18, 2008), 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  29. Williams, S., Oliker, L, Vuduc, F., Shalf, J., Yelick, K., and Demmel, J. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proceedings of the ACM/IEEE SC07 Conference (Reno, NV, Nov. 10--16). ACM Press, New York, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Woo, S., Ohara, M., Torrie, E., Singh, J.-P., and Gupta, A. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. ACM Press, New York, 1995, 24--37. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Roofline: an insightful visual performance model for multicore architectures

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image Communications of the ACM
                  Communications of the ACM  Volume 52, Issue 4
                  A Direct Path to Dependable Software
                  April 2009
                  134 pages
                  ISSN:0001-0782
                  EISSN:1557-7317
                  DOI:10.1145/1498765
                  Issue’s Table of Contents

                  Copyright © 2009 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 1 April 2009

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article
                  • Popular
                  • Refereed

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader

                HTML Format

                View this article in HTML Format .

                View HTML Format