Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

Authors:
Dean M. Tullsen

Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA

Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA
View Profile

,
Susan J. Eggers

Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA

Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA
View Profile

,
Joel S. Emer

Digital Equipment Corporation, HLO2-3/J3, 77 Reed Road, Hudson, MA

Digital Equipment Corporation, HLO2-3/J3, 77 Reed Road, Hudson, MA
View Profile

,
Henry M. Levy

Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA

Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA
View Profile

,
Jack L. Lo

Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA

Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA
View Profile

,
Rebecca L. Stamm

Digital Equipment Corporation, HLO2-3/J3, 77 Reed Road, Hudson, MA

Digital Equipment Corporation, HLO2-3/J3, 77 Reed Road, Hudson, MA
View Profile

ISCA '96: Proceedings of the 23rd annual international symposium on Computer architectureMay 1996Pages 191–202https://doi.org/10.1145/232973.232993

Published:01 May 1996Publication History

ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture

Pages 191–202

ABSTRACT

Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized model. In this paper we show that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. We present an architecture for simultaneous multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads most efficiently using the processor each cycle, thereby providing the "best" instructions to the processor.

References

1.A. Agarwal, B.H. Lim, D. Kranz, and J. Kubiatowicz. APRIL: a processor architecture for multiprocessing. In 17th Annual international Symposium on Computer Architecture, pages 104-114, May 1990. Google ScholarDigital Library
2.R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera computer system. in International Conference on Supercomputing, pages 1-6, June 1990. Google ScholarDigital Library
3.C.J. Beckmann and C.D. Polychronopoulos. Microarchitecture support for dynamic scheduling of acyclic task graphs. In 25th Annual International Symposium on Microarchitecture, pages 140-148, December 1992. Google ScholarDigital Library
4.B. Calder and D. Grunwalcl. Fast and accurate instruction fetch and branch prediction. In 21st Annual International Symposium on Computer Architecture, pages 2-11, April 1994. Google ScholarDigital Library
5.T.M. Conte, K.N. Menezes, RM. Mills, and B.A. Patel. Optimization of instruction fetch mechanisms for high issue rates. In 22nd Annual International Symposium on Computer Architecture, pages 333-344, June 1995. Google ScholarDigital Library
6.G.E. Daddis, Jr. and H.C. Tomg. The concurrent execution of multiple instruction streams on superscalar processors. In International Conference on Parallel Processing, pages I:76- 83, August 1991.Google Scholar
7.K.M. Dixit. New CPU benchmark suites from SPEC. In COMPCON, Spring 1992, pages 305-310, 1992. Google ScholarDigital Library
8.J. Edmondson and R Rubinfield. An overview of the 21164 AXP microprocessor. In Hot Chips VI, pages 1-8, August 1994.Google Scholar
9.M. Fillo, S.W. Keckler, W.J. Dally, N.R Carter, A. Chang, Y. Gurevich, and W.S. Lee. The M-Machine multicomputer. In 28th Annual International Symposium on Microarchitecture, November 1995. Google ScholarDigital Library
10.R. Govindarajan, S.S. Nemawarkar, and R LeNir. Design and peformance evaluation of a multithreaded architecture. In First IEEE Symposium on High-Performance Computer Architecture, pages 298-307, January 1995. Google ScholarDigital Library
11.M. Gulati and N. Bagherzadeh. Performance study of a multithreaded superscalar microprocessor. In Second International Symposium on High-Performance Computer Architecture, pages 291-301, February 1996. Google ScholarDigital Library
12.B.K. Gunther. Superscalarperformance in a multithreaded microprocessor. PhD thesis, University of Tasmania, December 1993.Google Scholar
13.H. Hirata, K. Kimura, S. Nagarnine, Y. Mochizuki, A. Nishimura, Y. Nakase, and T. Nishizawa. An elementary processor architecture with simultaneous instruction issuing from multiple threads. In 19th Annual International Symposium on Computer Architecture, pages 136-145, May 1992. Google ScholarDigital Library
14.S.W. Keckler and W.J. Dally. Processor coupling: Integrating compile time and runtime scheduling for parallelism. In 19th Annual international Symposium on Computer Architecture, pages 202-213, May 1992. Google ScholarDigital Library
15.J. Laudon, A. Gupta, and M. Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308-318, October 1994. Google ScholarDigital Library
16.Y. Li and W. Chu. The effects of STEF in finely parallel multithreaded processors. In First IEEE Symposium on High- Performance Computer Architecture, pages 318-325, January 1995. Google ScholarDigital Library
17.P.G. Lowney, S.M. Freudenberger, T.j. Karzes, W.D. Lichtenstein, R.P. Nix, J.S. ODonneU, and J.C. Ruttenberg. The mulfiflow trace scheduling compiler. Journal of Supercomputing, 7(1-2):51-142, May 1993. Google ScholarDigital Library
18.S. McFarling. Combining branch predictors. TechnicalReport TN-36, DEC-WRL, June 1993.Google Scholar
19.R.G. Prasadh and C.-L. Wu. A benchmark evaluation of a multi-threaded RISC processor architecture. In International Conference on Parallel Processing, pages I:84-91, August 1991.Google Scholar
20.Microprocessor Report, October 24 1994.Google Scholar
21.Microprocessor Report, November 14 1994.Google Scholar
22.E.G. Sirer. Measuring limits of fine-grained parallelism. Senior Independent Work, Princeton University, June 1993.Google Scholar
23.B.J. Smith. Architecture and applications ofthe HEP multiprocessor computer system. In SPIE Real Time Signal Processing /V, pages 241-248, 1981.Google Scholar
24.M.D. Smith, M. Johnson, and M.A. Horowitz. Limits on multiple instruction issue. In Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 290-302, 1989. Google ScholarDigital Library
25.G.S. Sohi, S.E. Breach, and T.N. Vijaykumar. Multiscalar processors. In 22nd Annual International Symposium on Computer Architecture, pages 414--425, June 1995. Google ScholarDigital Library
26.G.S. Sohi and M. Franklin. High-bandwidth data memory systems for superscalar processors. In Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 53-62, April 1991. Google ScholarDigital Library
27.D.M. TuUsen, S.J. Eggers, and H.M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In 22nd Annual International Symposium on Computer Architecture, pages 392-403, June 1995. Google ScholarDigital Library
28.W. Yamamoto and M. Nemirovsky. Increasing superscalar performance through multistreaming. In Conference on Parallel Architectures and Compilation Techniques, pages 49-58, June 1995. Google ScholarDigital Library
29.W. Yamamoto, M.J. Serrano, A.R. Talcott, R.C. Wood, and M. Nemirosky. Performance estimation of multistreamed, superscalar processors. In Twenty-Seventh Hawaii International Conference on System Sciences, pages I:195-204, January 1994.Google ScholarCross Ref
30.T.-Y. Yeh and Y. Patt. Alternative implementations of twolevel adaptive branch prediction. In 19th Annual International Symposium on Computer Architecture, pages 124-134, May 1992. Google ScholarDigital Library

Index Terms

Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

Recommendations

Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor
Special Issue: Proceedings of the 23rd annual international symposium on Computer architecture (ISCA '96)

Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized ...
Read More
Exploiting Java instruction/thread level parallelism with horizontal multithreading

Java bytecodes can be executed with the following three methods: a Java interpretor running on a particular machine interprets bytecodes; a Just-In-Time (JIT) compiler translates bytecodes to the native primitives of the particular machine and the ...
Read More
Application-Specific Pipelines for Exploiting Instruction-Level Parallelism
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture
May 1996
318 pages
ISBN:0897917863
DOI:10.1145/232973
Chairman:
Jean-Loup Baer
Univ. of Washington, Seattle
ACM SIGARCH Computer Architecture News Volume 24, Issue 2
Special Issue: Proceedings of the 23rd annual international symposium on Computer architecture (ISCA '96)
May 1996
303 pages
ISSN:0163-5964
DOI:10.1145/232974
Chairman:
Jean-Loup Baer
Univ. of Washington, Seattle
Issue’s Table of Contents
Copyright © 1996 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 1996
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 683
  Total Citations
  View Citations
- 4,214
  Total Downloads
- Downloads (Last 12 months)307
- Downloads (Last 6 weeks)31
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

Exploiting Java instruction/thread level parallelism with horizontal multithreading

Application-Specific Pipelines for Exploiting Instruction-Level Parallelism