skip to main content
10.1145/232973.232993acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free Access

Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

Authors Info & Claims
Published:01 May 1996Publication History

ABSTRACT

Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized model. In this paper we show that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. We present an architecture for simultaneous multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads most efficiently using the processor each cycle, thereby providing the "best" instructions to the processor.

References

  1. 1.A. Agarwal, B.H. Lim, D. Kranz, and J. Kubiatowicz. APRIL: a processor architecture for multiprocessing. In 17th Annual international Symposium on Computer Architecture, pages 104-114, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera computer system. in International Conference on Supercomputing, pages 1-6, June 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.C.J. Beckmann and C.D. Polychronopoulos. Microarchitecture support for dynamic scheduling of acyclic task graphs. In 25th Annual International Symposium on Microarchitecture, pages 140-148, December 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.B. Calder and D. Grunwalcl. Fast and accurate instruction fetch and branch prediction. In 21st Annual International Symposium on Computer Architecture, pages 2-11, April 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.T.M. Conte, K.N. Menezes, RM. Mills, and B.A. Patel. Optimization of instruction fetch mechanisms for high issue rates. In 22nd Annual International Symposium on Computer Architecture, pages 333-344, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.G.E. Daddis, Jr. and H.C. Tomg. The concurrent execution of multiple instruction streams on superscalar processors. In International Conference on Parallel Processing, pages I:76- 83, August 1991.Google ScholarGoogle Scholar
  7. 7.K.M. Dixit. New CPU benchmark suites from SPEC. In COMPCON, Spring 1992, pages 305-310, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.J. Edmondson and R Rubinfield. An overview of the 21164 AXP microprocessor. In Hot Chips VI, pages 1-8, August 1994.Google ScholarGoogle Scholar
  9. 9.M. Fillo, S.W. Keckler, W.J. Dally, N.R Carter, A. Chang, Y. Gurevich, and W.S. Lee. The M-Machine multicomputer. In 28th Annual International Symposium on Microarchitecture, November 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.R. Govindarajan, S.S. Nemawarkar, and R LeNir. Design and peformance evaluation of a multithreaded architecture. In First IEEE Symposium on High-Performance Computer Architecture, pages 298-307, January 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.M. Gulati and N. Bagherzadeh. Performance study of a multithreaded superscalar microprocessor. In Second International Symposium on High-Performance Computer Architecture, pages 291-301, February 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.B.K. Gunther. Superscalarperformance in a multithreaded microprocessor. PhD thesis, University of Tasmania, December 1993.Google ScholarGoogle Scholar
  13. 13.H. Hirata, K. Kimura, S. Nagarnine, Y. Mochizuki, A. Nishimura, Y. Nakase, and T. Nishizawa. An elementary processor architecture with simultaneous instruction issuing from multiple threads. In 19th Annual International Symposium on Computer Architecture, pages 136-145, May 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.S.W. Keckler and W.J. Dally. Processor coupling: Integrating compile time and runtime scheduling for parallelism. In 19th Annual international Symposium on Computer Architecture, pages 202-213, May 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.J. Laudon, A. Gupta, and M. Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308-318, October 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.Y. Li and W. Chu. The effects of STEF in finely parallel multithreaded processors. In First IEEE Symposium on High- Performance Computer Architecture, pages 318-325, January 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.P.G. Lowney, S.M. Freudenberger, T.j. Karzes, W.D. Lichtenstein, R.P. Nix, J.S. ODonneU, and J.C. Ruttenberg. The mulfiflow trace scheduling compiler. Journal of Supercomputing, 7(1-2):51-142, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.S. McFarling. Combining branch predictors. TechnicalReport TN-36, DEC-WRL, June 1993.Google ScholarGoogle Scholar
  19. 19.R.G. Prasadh and C.-L. Wu. A benchmark evaluation of a multi-threaded RISC processor architecture. In International Conference on Parallel Processing, pages I:84-91, August 1991.Google ScholarGoogle Scholar
  20. 20.Microprocessor Report, October 24 1994.Google ScholarGoogle Scholar
  21. 21.Microprocessor Report, November 14 1994.Google ScholarGoogle Scholar
  22. 22.E.G. Sirer. Measuring limits of fine-grained parallelism. Senior Independent Work, Princeton University, June 1993.Google ScholarGoogle Scholar
  23. 23.B.J. Smith. Architecture and applications ofthe HEP multiprocessor computer system. In SPIE Real Time Signal Processing /V, pages 241-248, 1981.Google ScholarGoogle Scholar
  24. 24.M.D. Smith, M. Johnson, and M.A. Horowitz. Limits on multiple instruction issue. In Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 290-302, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.G.S. Sohi, S.E. Breach, and T.N. Vijaykumar. Multiscalar processors. In 22nd Annual International Symposium on Computer Architecture, pages 414--425, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.G.S. Sohi and M. Franklin. High-bandwidth data memory systems for superscalar processors. In Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 53-62, April 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27.D.M. TuUsen, S.J. Eggers, and H.M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In 22nd Annual International Symposium on Computer Architecture, pages 392-403, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28.W. Yamamoto and M. Nemirovsky. Increasing superscalar performance through multistreaming. In Conference on Parallel Architectures and Compilation Techniques, pages 49-58, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29.W. Yamamoto, M.J. Serrano, A.R. Talcott, R.C. Wood, and M. Nemirosky. Performance estimation of multistreamed, superscalar processors. In Twenty-Seventh Hawaii International Conference on System Sciences, pages I:195-204, January 1994.Google ScholarGoogle ScholarCross RefCross Ref
  30. 30.T.-Y. Yeh and Y. Patt. Alternative implementations of twolevel adaptive branch prediction. In 19th Annual International Symposium on Computer Architecture, pages 124-134, May 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture
            May 1996
            318 pages
            ISBN:0897917863
            DOI:10.1145/232973
            • cover image ACM SIGARCH Computer Architecture News
              ACM SIGARCH Computer Architecture News  Volume 24, Issue 2
              Special Issue: Proceedings of the 23rd annual international symposium on Computer architecture (ISCA '96)
              May 1996
              303 pages
              ISSN:0163-5964
              DOI:10.1145/232974
              Issue’s Table of Contents

            Copyright © 1996 Authors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 May 1996

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate543of3,203submissions,17%

            Upcoming Conference

            ISCA '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader