skip to main content
10.1145/264107.264119acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free Access

Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

Authors Info & Claims
Published:01 May 1997Publication History

ABSTRACT

Superscalar processors currently have the potential to fetch multiple basic blocks per cycle by employing one of several recently proposed instruction fetch mechanisms. However, this increased fetch bandwidth cannot be exploited unless pipeline stages further downstream correspondingly improve. In particular, register renaming a large number of instructions per cycle is difficult. A large instruction window, needed to receive multiple basic blocks per cycle, will slow down dependence resolution and instruction issue. This paper addresses these and related issues by proposing (i) partitioning of the instruction window into multiple blocks, each holding a dynamic code sequence; (ii) logical partitioning of the register file into a global file and several local files, the latter holding registers local to a dynamic code sequence; (iii) the dynamic recording and reuse of register renaming information for registers local to a dynamic code sequence. Performance studies show these mechanisms improve performance over traditional superscalar processors by factors ranging from 1.5 to a little over 3 for the SPEC Integer programs. Next, it is observed that several of the loops in the benchmarks display vector-like behavior during execution, even if the static loop bodies are likely complex for compile-time vectorization. A dynamic loop vectorization mechanism that builds on top of the above mechanisms is briefly outlined. The mechanism vectorizes up to 60% of the dynamic instructions for some programs, albeit the average number of iterations per loop is quite small.

References

  1. Aus92a.T. M. Austin and G. S. Sohi, "Dynamic Dependency Analysis of Ordinary Programs," in The 19th Annual International Symposium on Computer Architecture, Gold Coast, Australia, May 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bur96a.D. Burger, J. R. Goodman, and A. Kagi, "Quantifying Memory Bandwidth Limitations of Current and Future Microprocessors," 23rd Int'l Symposium on Computer Architecture, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Con95a.T. Conte, K. N. Menezes, P. M. Mills, and B. Patel, "Optimization of Instruction Fetch Mechanisms for High Issue Rates," 22nd Annual Int'l Symposium on Computer Architecture, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dit82a.D. R. Ditzel and H. R. McLellan, "Register Allocation for Free: the C Machine Stack Cache," Proc. Znt. Symp. on Arch. Support for Prog. Lang. and Operating Sys., March 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Fra92a.M. Franklin and G. S. Sohi "Register Traflic Analysis for Streamlining Inter-Operation Communication in Fine-Grain Parallel Processors," 25th Annual Symposium on Microarchitecture, Dec. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fra92b.M. Franklin and G. S. Sohi, "The Expandable Split Window Architecture for Exploiting Fine-Grain Parallelism," in The 19th Annual International Symposium on Computer Architecture, Gold Coast, Australia, May 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fra93a.M. Franklin, "The Multiscalar Architecture," Ph.D. Thesis, University of Wisconsin-Madison, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fra94a.M. Franklin and M. Smotherman, "A Fill-Unit Approach to Multiple Instruction Issue," 27th Int'l Symposium on Microarchitecture. Dec. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fra95a.M. Franklin and S. Dutta, "Control Flow Prediction with Tree-Lie Subgraphs for Superscalar Processors," 28th Annual Symposium on Microarchitecture, Nov. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hao96a.E. Hao, P-Y. Chang, M. Evers, and Y. Patt, "Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures," 29th Annual Int'l Symposium on Microarchitecture (to appear), Dec. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hil84a.M. D. Hill and A. J. Smith, "Experimental Evaluation of On-Chip Microprocessor Cache Memories," Proc. 11th Annual Symposium on Computer Architecture, June 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hwu87a.W. W. Hwu and Y. N. Patt, "Design Choices for the HPSm Microprocessor Chip," in Proc. 20th Annual Hawaii International Conference on System Sciences, Kona, HI, January 1987.Google ScholarGoogle Scholar
  13. IBM90a.IBM, "Special Issue on the IBM RISC System/6000 Processor," IBM Journal of Research and Development, January 1990.Google ScholarGoogle Scholar
  14. Lam92a.M. S. Lam and R. P. Wilson, "Limits of Control Flow on Parallelism," Proc. International Symposium on Computer Architecture, May 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mel88a.S. W. Melvin. M. C. Shebanow, and Y. N. Patt, "Hardware Support for Large Atomic Units in Dynamically Scheduled Machines," in Proc. 21st Annual Workshop on Microprogramming and Microarchitecture, San Diego, CA, November 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mit97a.Tulika Mitra. "Performance Evaluation of Improved Superscalar Issue Mechanisms," in M.E. Project Report, Dept. of Computer Science, Indian Institute of Science, January 1997.Google ScholarGoogle Scholar
  17. Pal96a.S. Palacharla. N. Jouppi, and J. E. Smith, "Quantifying the Complexity of Superscalar Processors," Univ. of Wisconsin-Madison Technical Report, vol. CS-T&96- 1328, November 1996, (Available at http:l/www.cs.wisc.edultrs.html; a version to appear in ISCA'97).Google ScholarGoogle Scholar
  18. Pat85a.Y. N. Patt, W. W. Hwu, and M. Shebanow, "HPS, A New Microarchitecture: Rationale and Introduction," in Proc. 18th Annual Workshop on Microprogramming, Pacific Grove, CA, December 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Pat85b.Y. N. Patt, S. W. Melvin, W. W. Hwu, and M. Shebanow, "Critical Issues Regarding HPS, A High Performance Microarchitecture," in Proc. 1Sth Annual Workshop on Microprogramming, Pacific Grove, CA, December 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Rot96a.E. Rotenberg, S. Bennett. and J. E. Smith, "Trace Cache: ALow Latency Approach to High Bandwidth Instruction Fetching," in 29th Annual Int'l Symposium on Microarchirecture. Paris, Dec. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rus78a.R. M. Russel, "The Cray-1 Computer System," Communications of the ACM, vol. 21. pp. 63-72, Jan, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Smi84a.J. E. Smith, "Decoupled Access/Execute Architectures," ACM Transactions on Computer Systems, Nov. 1984, Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Smo95a.M. Smotherman and M. Franklin, "Improving CISC Instruction Decoding Performance Using a Fill Unit," 28th Annual Symposium on Microarchtecture, Dec. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Spr94a.E. Sprangler and Y. N. Patt, "Facilitating Superscalar Processing via a Combined Static/Dynamic Register Renaming Scheme," 27th Annual Int'l Symposium on Microarchitecture. Dec.1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tom67a.R. M. Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic Units," IBM Journal of Research and Development, January 1967.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Uht92a.A. K. Uht. "Concurrency Extraction via Hardware Methods Executing the Static Instruction Stream," IEEE Transactions on Computers, vol. 41. July 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Wal91a.D. Wall, "Limits of Instruction Level Parallelism," 4t/t International Conf. on Arch.Support for Prog.Langs, and Op.Sys. April 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Wei95a.Shlomo Weiss, "Implementing Register Interlocks in Parallel-Pipeline, Multiple Instruction Queue, Superscalalr Processoors," Proc. First Int'l Symposium on High Performance Computer Architecture, 1995, Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yeh93b.T-Y. Yeh and Y. N. Patt, "A Compnrison of Dynnmic Branch Predictors that use Two Levels of Branch Histoly." 20th Int'l Symposium on Computer Architecture, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yeh93a.T-Y. Yeh, D. MArr. and Y. Patt, "Increasing the Instruction Fetch Rate via Multiple Branch Prediction and a Branch Address Cache," Proc. 7th ACM Int'l Conference on Supercomputing. July 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture
              June 1997
              350 pages
              ISBN:0897919017
              DOI:10.1145/264107
              • cover image ACM SIGARCH Computer Architecture News
                ACM SIGARCH Computer Architecture News  Volume 25, Issue 2
                Special Issue: Proceedings of the 24th annual international symposium on Computer architecture (ISCA '97)
                May 1997
                349 pages
                ISSN:0163-5964
                DOI:10.1145/384286
                Issue’s Table of Contents

              Copyright © 1997 Authors

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 May 1997

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              Overall Acceptance Rate543of3,203submissions,17%

              Upcoming Conference

              ISCA '24

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader