skip to main content
research-article

The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism

Published:24 February 2014Publication History
Skip Abstract Section

Abstract

The number of active threads in a multi-core processor varies over time and is often much smaller than the number of supported hardware threads. This requires multi-core chip designs to balance core count and per-core performance. Low active thread counts benefit from a few big, high-performance cores, while high active thread counts benefit more from a sea of small, energy-efficient cores.

This paper comprehensively studies the trade-offs in multi-core design given dynamically varying active thread counts. We find that, under these workload conditions, a homogeneous multi-core processor, consisting of a few high-performance SMT cores, typically outperforms heterogeneous multi-cores consisting of a mix of big and small cores (without SMT), within the same power budget. We also show that a homogeneous multi-core performs almost as well as a heterogeneous multi-core that also implements SMT, as well as a dynamic multi-core, while being less complex to design and verify. Further, heterogeneous multi-cores that power-gate idle cores yield (only) slightly better energy-efficiency compared to homogeneous multi-cores.

The overall conclusion is that the benefit of SMT in the multi-core era is to provide flexibility with respect to the available thread-level parallelism. Consequently, homogeneous multi-cores with big SMT cores are competitive high-performance, energy-efficient design points for workloads with dynamically varying active thread counts.

References

  1. M. Annavaram, E. Grochowski, and J. Shen. Mitigating Amdahl's law through EPI throttling. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 298--309, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. A. Barroso and U. Hölzle. The case for energy-proportional systems. IEEE Computer, 40: 33--37, Dec. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 72--81, Oct. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Blake, R. G. Dreslinski, T. N. Mudge, and K. Flautner. Evolution of thread-level parallelism in desktop applications. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 302--313, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. E. Carlson, W. Heirman, and L. Eeckhout. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 52:1--52:12, Nov. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Du Bois, S. Eyerman, J. Sartor, and L. Eeckhout. Criticality stacks: Identifying critical threads in parallel programs using synchronization behavior. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 511--522, June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Eyerman and L. Eeckhout. System-level performance metrics for multi-program workloads. IEEE Micro, 28 (3): 42--53, May/June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Greenhalgh. Big.LITTLE processing with ARM Cortex-A15 & Cortex-A7: Improving energy efficiency in high-performance mobile platforms. http://www.arm.com/files/downloads/big\_LITTLE\_Final\_Final.pdf, Sept. 2011.Google ScholarGoogle Scholar
  9. L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 58--69, Oct. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. D. Hill and M. R. Marty. Amdahl's law in the multicore era. IEEE Computer, 41 (7): 33--38, July 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Ipek, M. Kirman, N. Kirman, and J. F. Martinez. Core fusion: Accommodating software diversity in chip multiprocessors. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 186--197, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. A. Joao, M. A. Suleman, O. Mutlu, and Y. N. Patt. Bottleneck identification and scheduling in multithreaded applications. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 223--234, Mar. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. T. Jones. Inside the Linux scheduler: The latest version of this all-important kernel component improves scalability. http://www.ibm.com/developerworks/linux/library/l-scheduler/index.html, June 2006.Google ScholarGoogle Scholar
  14. R. Kalla, B. Sinharoy, W. J. Starke, and M. Floyd. Power7: IBM's next-generation server processor. IEEE Micro, 30: 7--15, March/April 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. N. Keltcher, K. J. McGrath, A. Ahmed, and P. Conway. The AMD Opteron processor for multiprocessor servers. IEEE Micro, 23 (2): 66--76, Mar. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Khubaib, M. Suleman, M. Hashemi, C. Wilkerson, and Y. Patt. MorphCore: An energy-efficient microarchitecture for high performance ILP and high throughput TLP. In 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 305--316, Dec. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Kim, S. Sethumadhavan, M. S. Govindan, N. Ranganathan, D. Gulati, D. Burger, and S. Keckler. Composable lightweight processors. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 381--394, Dec. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings of the ACM/IEEE Annual International Symposium on Microarchitecture (MICRO), pages 81--92, Dec. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 64--75, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 469--480, Dec. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Li, D. Brooks, Z. Hu, and K. Skadron. Performance, energy, and thermal considerations for SMT and CMP architectures. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), pages 71--82, Feb. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. NVidia. Variable SMP -- a multi-core CPU architecture for low power and high performance. http://www.nvidia.com/content/PDF/tegra\_white\_papers/Variable-SMP-A-Multi-%Core-CPU-Architecture-for-Low-Power-and-High-Performance-v1.1.pdf, 2011.Google ScholarGoogle Scholar
  23. K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K.-Y. Chang. The case for a single-chip multiprocessor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 2--11, Oct. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. E. Raasch and S. K. Reinhardt. The impact of resource partitioning on SMT processors. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 15--26, Sept. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. Rotem, A. Naveh, D. Rajwan, A. Ananthakrishnan, and E. Weissmann. Power-management architecture of the intel microarchitecture code-named sandy bridge. IEEE Micro, 32: 20--27, March/April 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 45--57, Oct. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for simultaneous multithreading processor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 234--244, Nov. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA), pages 414--425, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating critical section execution with asymmetric multi-core architectures. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 253--264, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. A. Suleman, O. Mutlu, J. A. Joao, Khubaib, and Y. N. Patt. Data marshaling for multi-core architectures. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 441--450, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA), pages 191--202, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Velasquez, P. Michaud, and A. Seznec. Selecting benchmark combinations for the evaluation of multicore throughput. In The IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 173--182, Apr. 2013.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 42, Issue 1
      ASPLOS '14
      March 2014
      729 pages
      ISSN:0163-5964
      DOI:10.1145/2654822
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
        February 2014
        780 pages
        ISBN:9781450323055
        DOI:10.1145/2541940

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 February 2014

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader