skip to main content
10.1145/977091.977152acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
Article

Predictable performance in SMT processors

Published:14 April 2004Publication History

ABSTRACT

Current instruction fetch policies in SMT processors are oriented towards optimization of overall throughput and/or fairness. However, they provide no control over how individual threads are executed, leading to performance unpredictability, since the IPC of a thread depends on the workload it is executed in and on the fetch policy used.From the point of view of the Operating System (OS), it is the job scheduler that determines how jobs are executed. However, when the OS runs on an SMT processor, the job scheduler cannot guarantee execution time constraints of any job due to this performance unpredictability.In this paper we propose a novel kind of collaboration between the OS and the SMT hardware that enables the OS to enforce that a high priority thread runs at a specific fraction of its full speed. We present an extensive evaluation using many different workloads, that shows that this mechanism gives the required performance in more than 97% of all cases considered, and even more than 99% for the less extreme cases. At the same time, our mechanism does not need to trade off predictability against overall throughput, as it maximizes the IPC of the remaining low priority threads, giving 94% on average (and 97.5% on average for the less extreme cases) of the throughput obtained using instruction fetch policies oriented toward throughput maximization, such as icount.

References

  1. D. Alpert. Will microprocessors become simpler? Microprocessor Report, Nov. 2003.Google ScholarGoogle Scholar
  2. J. Burns and J.-L. Gaudiot. Quantifying the SMT layout overhead-does SMT pull its weight? Proceedings of the 6th Intl. Conference on High Performance Computer Architecture, pages 109--120, Jan. 2000.Google ScholarGoogle Scholar
  3. J. Burns and J.-L. Gaudiot. SMT layout overhead and scalability. IEEE Transactions on Parallel and Distributed Systems, 13(1):142--155, Feb. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. J. Cazorla, E. Fernandez, A. Ramirez, and M. Valero. Improving memory latency aware fetch policies for SMT processors. Proceedings of the 5th International Symposium on High Performance Computing, Oct. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  5. D. Chiou, P. Jain, S. Devadas, and L. Rudolph. Dynamic cache partitioning via columnization. Proceedings of Design Automation Conference, June 2000.Google ScholarGoogle Scholar
  6. G. K. Dorai and D. Yeung. Transparent threads: Resource sharing in smt processors for high single-thread performance. Proceedings of the 11th Intl. Conference on Parallel Architectures and Compilation Techniques, pages 30--41, Sept. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. El-Moursy and D. Albonesi. Front-end policies for improved issue efficiency in SMT processors. Proceedings of the 9th Intl. Conference on High Performance Computer Architecture, Feb. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. N. Glaskowsky. IBM previews Power5. Microprocessor Report, Sept. 2003.Google ScholarGoogle Scholar
  9. M. Gulati and N. Bagherzadeh. Performance study of a multithreaded superscalar microprocessor. Proceedings of the 2nd Intl. Conference on High Performance Computer Architecture, pages 291--301, Feb. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Hily and A. Seznec. Contention on 2nd level cache may limit the effectiveness of simultaneous multithreading. Technical Report 1086, IRISA, Feb. 1997.Google ScholarGoogle Scholar
  11. H. Hirata, K. Kimura, S. Nagamine, Y. Mochizuki, A. Nishimura, Y. Nakase, and T. Nishizawa. An elementary processor architecture with simultaneous instruction issuing from multiple threads. Proceedings of the 19th Annual Intl. Symposium on Computer Architecture, pages 136--145, May 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Jain, C. Hughes, and S. Adve. Soft real-time scheduling on simultaneous multithreaded processors. Proceedings of the 5th International Symposium on Real-Time Systems Symposium, pages 134--145, Dec. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Kalla, B. Sinharoy, and J. Tendler. SMT implementation in POWER 5. Hot Chips, 15, Aug. 2003.Google ScholarGoogle Scholar
  14. P. Knijnenburg, A. Ramirez, J. Larriba, and M. Valero. Branch classification for SMT fetch gating. Proceedings of the 6th Workshop on Multithreaded Execution, Architecture, and Compilation, pages 49--56, 2002.Google ScholarGoogle Scholar
  15. C. Limousin, J. Sebot, A. Vartanian, and N. Drach-Temam. Improving 3D geometry transformations on a simultaneous multithreaded SIMD processor. Proceedings of the 15th Intl. Conference on Supercomputing, pages 236--245, May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. Proceedings of the International Symposium on Performance Analysis of Systems and Software, pages 164--171, Nov. 2001.Google ScholarGoogle Scholar
  17. D. T. Marr, F. Binns, D. Hill, G. Hinton, D. Koufaty, J. A. Miller, and M. Upton. Hyper-threading technology architecture and microarchitecture. Intel Technology Journal, 6(1), Feb. 2002.Google ScholarGoogle Scholar
  18. T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. Proceedings of the 10th Intl. Conference on Parallel Architectures and Compilation Techniques, Sept. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Shin, S.-W. Lee, and J. L. Gaudiot. Dynamic scheduling issues in smt architectures. Proceedings of the International Parallel and Distributed Processing Symposium, Apr. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Snavely, D. Tullsen, and G. Voelker. Symbiotic job scheduling with priorities for a simultaneous multithreaded processor. Proceedings of the 9th Intl. Conference on Architectural Support for Programming Languages and Operating Systems, pages 234--244, Nov. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Tullsen and J. Brown. Handling long-latency loads in a simultaneous multithreaded processor. Proceedings of the 34th Annual ACM/IEEE Intl. Symposium on Microarchitecture, Dec. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Tullsen, S. Eggers, J. Emer, H. Levy, J. Lo, and R. Stamm. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. Proceedings of the 23th Annual Intl. Symposium on Computer Architecture, pages 191--202, Apr. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Tullsen, S. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. Proceedings of the 22th Annual Intl. Symposium on Computer Architecture, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe. SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling. Proceedings of the 30th Annual Intl. Symposium on Computer Architecture, pages 84--97, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. W. Yamamoto and M. Nemirovsky. Increasing superscalar performance through multistreaming. Proceedings of the 4th Intl. Conference on Parallel Architectures and Compilation Techniques, pages 49--58, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Predictable performance in SMT processors

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CF '04: Proceedings of the 1st conference on Computing frontiers
        April 2004
        522 pages
        ISBN:1581137419
        DOI:10.1145/977091

        Copyright © 2004 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 April 2004

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate240of680submissions,35%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader