Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

Authors:
Sriram Vajapeyam

Supercomputer Education and Research Centre and Dept. of Computer Science & Automation, Indian Institnte of Science, Bangalore, India 560012

Supercomputer Education and Research Centre and Dept. of Computer Science & Automation, Indian Institnte of Science, Bangalore, India 560012
View Profile

,
Tulika Mitra

Dept. of Computer Science & Automation, Indian Institute of science, Bangalore, India 560012

Dept. of Computer Science & Automation, Indian Institute of science, Bangalore, India 560012
View Profile

ISCA '97: Proceedings of the 24th annual international symposium on Computer architectureJune 1997Pages 1–12https://doi.org/10.1145/264107.264119

Published:01 May 1997Publication History

ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture

Pages 1–12

ABSTRACT

Superscalar processors currently have the potential to fetch multiple basic blocks per cycle by employing one of several recently proposed instruction fetch mechanisms. However, this increased fetch bandwidth cannot be exploited unless pipeline stages further downstream correspondingly improve. In particular, register renaming a large number of instructions per cycle is difficult. A large instruction window, needed to receive multiple basic blocks per cycle, will slow down dependence resolution and instruction issue. This paper addresses these and related issues by proposing (i) partitioning of the instruction window into multiple blocks, each holding a dynamic code sequence; (ii) logical partitioning of the register file into a global file and several local files, the latter holding registers local to a dynamic code sequence; (iii) the dynamic recording and reuse of register renaming information for registers local to a dynamic code sequence. Performance studies show these mechanisms improve performance over traditional superscalar processors by factors ranging from 1.5 to a little over 3 for the SPEC Integer programs. Next, it is observed that several of the loops in the benchmarks display vector-like behavior during execution, even if the static loop bodies are likely complex for compile-time vectorization. A dynamic loop vectorization mechanism that builds on top of the above mechanisms is briefly outlined. The mechanism vectorizes up to 60% of the dynamic instructions for some programs, albeit the average number of iterations per loop is quite small.

References

Aus92a.T. M. Austin and G. S. Sohi, "Dynamic Dependency Analysis of Ordinary Programs," in The 19th Annual International Symposium on Computer Architecture, Gold Coast, Australia, May 1992. Google ScholarDigital Library
Bur96a.D. Burger, J. R. Goodman, and A. Kagi, "Quantifying Memory Bandwidth Limitations of Current and Future Microprocessors," 23rd Int'l Symposium on Computer Architecture, 1996. Google ScholarDigital Library
Con95a.T. Conte, K. N. Menezes, P. M. Mills, and B. Patel, "Optimization of Instruction Fetch Mechanisms for High Issue Rates," 22nd Annual Int'l Symposium on Computer Architecture, June 1995. Google ScholarDigital Library
Dit82a.D. R. Ditzel and H. R. McLellan, "Register Allocation for Free: the C Machine Stack Cache," Proc. Znt. Symp. on Arch. Support for Prog. Lang. and Operating Sys., March 1982. Google ScholarDigital Library
Fra92a.M. Franklin and G. S. Sohi "Register Traflic Analysis for Streamlining Inter-Operation Communication in Fine-Grain Parallel Processors," 25th Annual Symposium on Microarchitecture, Dec. 1992. Google ScholarDigital Library
Fra92b.M. Franklin and G. S. Sohi, "The Expandable Split Window Architecture for Exploiting Fine-Grain Parallelism," in The 19th Annual International Symposium on Computer Architecture, Gold Coast, Australia, May 1992. Google ScholarDigital Library
Fra93a.M. Franklin, "The Multiscalar Architecture," Ph.D. Thesis, University of Wisconsin-Madison, 1993. Google ScholarDigital Library
Fra94a.M. Franklin and M. Smotherman, "A Fill-Unit Approach to Multiple Instruction Issue," 27th Int'l Symposium on Microarchitecture. Dec. 1994. Google ScholarDigital Library
Fra95a.M. Franklin and S. Dutta, "Control Flow Prediction with Tree-Lie Subgraphs for Superscalar Processors," 28th Annual Symposium on Microarchitecture, Nov. 1995. Google ScholarDigital Library
Hao96a.E. Hao, P-Y. Chang, M. Evers, and Y. Patt, "Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures," 29th Annual Int'l Symposium on Microarchitecture (to appear), Dec. 1996. Google ScholarDigital Library
Hil84a.M. D. Hill and A. J. Smith, "Experimental Evaluation of On-Chip Microprocessor Cache Memories," Proc. 11th Annual Symposium on Computer Architecture, June 1984. Google ScholarDigital Library
Hwu87a.W. W. Hwu and Y. N. Patt, "Design Choices for the HPSm Microprocessor Chip," in Proc. 20th Annual Hawaii International Conference on System Sciences, Kona, HI, January 1987.Google Scholar
IBM90a.IBM, "Special Issue on the IBM RISC System/6000 Processor," IBM Journal of Research and Development, January 1990.Google Scholar
Lam92a.M. S. Lam and R. P. Wilson, "Limits of Control Flow on Parallelism," Proc. International Symposium on Computer Architecture, May 1992. Google ScholarDigital Library
Mel88a.S. W. Melvin. M. C. Shebanow, and Y. N. Patt, "Hardware Support for Large Atomic Units in Dynamically Scheduled Machines," in Proc. 21st Annual Workshop on Microprogramming and Microarchitecture, San Diego, CA, November 1988. Google ScholarDigital Library
Mit97a.Tulika Mitra. "Performance Evaluation of Improved Superscalar Issue Mechanisms," in M.E. Project Report, Dept. of Computer Science, Indian Institute of Science, January 1997.Google Scholar
Pal96a.S. Palacharla. N. Jouppi, and J. E. Smith, "Quantifying the Complexity of Superscalar Processors," Univ. of Wisconsin-Madison Technical Report, vol. CS-T&96- 1328, November 1996, (Available at http:l/www.cs.wisc.edultrs.html; a version to appear in ISCA'97).Google Scholar
Pat85a.Y. N. Patt, W. W. Hwu, and M. Shebanow, "HPS, A New Microarchitecture: Rationale and Introduction," in Proc. 18th Annual Workshop on Microprogramming, Pacific Grove, CA, December 1985. Google ScholarDigital Library
Pat85b.Y. N. Patt, S. W. Melvin, W. W. Hwu, and M. Shebanow, "Critical Issues Regarding HPS, A High Performance Microarchitecture," in Proc. 1Sth Annual Workshop on Microprogramming, Pacific Grove, CA, December 1985. Google ScholarDigital Library
Rot96a.E. Rotenberg, S. Bennett. and J. E. Smith, "Trace Cache: ALow Latency Approach to High Bandwidth Instruction Fetching," in 29th Annual Int'l Symposium on Microarchirecture. Paris, Dec. 1996. Google ScholarDigital Library
Rus78a.R. M. Russel, "The Cray-1 Computer System," Communications of the ACM, vol. 21. pp. 63-72, Jan, 1978. Google ScholarDigital Library
Smi84a.J. E. Smith, "Decoupled Access/Execute Architectures," ACM Transactions on Computer Systems, Nov. 1984, Google ScholarDigital Library
Smo95a.M. Smotherman and M. Franklin, "Improving CISC Instruction Decoding Performance Using a Fill Unit," 28th Annual Symposium on Microarchtecture, Dec. 1995. Google ScholarDigital Library
Spr94a.E. Sprangler and Y. N. Patt, "Facilitating Superscalar Processing via a Combined Static/Dynamic Register Renaming Scheme," 27th Annual Int'l Symposium on Microarchitecture. Dec.1994. Google ScholarDigital Library
Tom67a.R. M. Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic Units," IBM Journal of Research and Development, January 1967.Google ScholarDigital Library
Uht92a.A. K. Uht. "Concurrency Extraction via Hardware Methods Executing the Static Instruction Stream," IEEE Transactions on Computers, vol. 41. July 1992. Google ScholarDigital Library
Wal91a.D. Wall, "Limits of Instruction Level Parallelism," 4t/t International Conf. on Arch.Support for Prog.Langs, and Op.Sys. April 1991. Google ScholarDigital Library
Wei95a.Shlomo Weiss, "Implementing Register Interlocks in Parallel-Pipeline, Multiple Instruction Queue, Superscalalr Processoors," Proc. First Int'l Symposium on High Performance Computer Architecture, 1995, Google ScholarDigital Library
Yeh93b.T-Y. Yeh and Y. N. Patt, "A Compnrison of Dynnmic Branch Predictors that use Two Levels of Branch Histoly." 20th Int'l Symposium on Computer Architecture, 1993. Google ScholarDigital Library
Yeh93a.T-Y. Yeh, D. MArr. and Y. Patt, "Increasing the Instruction Fetch Rate via Multiple Branch Prediction and a Branch Address Cache," Proc. 7th ACM Int'l Conference on Supercomputing. July 1993. Google ScholarDigital Library

Index Terms

Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

Recommendations

Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences
Special Issue: Proceedings of the 24th annual international symposium on Computer architecture (ISCA '97)

Superscalar processors currently have the potential to fetch multiple basic blocks per cycle by employing one of several recently proposed instruction fetch mechanisms. However, this increased fetch bandwidth cannot be exploited unless pipeline stages ...
Read More
Exploiting selective instruction reuse and value prediction in a superscalar architecture

In our previously published research we discovered some very difficult to predict branches, called unbiased branches. Since the overall performance of modern processors is seriously affected by misprediction recovery, especially these difficult branches ...
Read More
Superscalar Instruction Issue

While providing a considerable potential for parallel execution, the performance of a superscalar microarchitecture depends heavily on the particular instruction issue scheme chosen. In this paper, we focus on the instruction issue task of superscalar ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture
June 1997
350 pages
ISBN:0897919017
DOI:10.1145/264107
Chairmen:
Andrew R. Pleszkun
Univ. of Colorado-Boulder, CO
,
Trevor Mudge
Univ. of Michigan
ACM SIGARCH Computer Architecture News Volume 25, Issue 2
Special Issue: Proceedings of the 24th annual international symposium on Computer architecture (ISCA '97)
May 1997
349 pages
ISSN:0163-5964
DOI:10.1145/384286
Editors:
Andrew R. Pleszkun
Univ. of Colorado-Boulder, CO
,
Trevor Mudge
Univ. of Michigan
Issue’s Table of Contents
Copyright © 1997 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 1997
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 47
  Total Citations
  View Citations
- 913
  Total Downloads
- Downloads (Last 12 months)135
- Downloads (Last 6 weeks)31
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

Exploiting selective instruction reuse and value prediction in a superscalar architecture

Superscalar Instruction Issue