skip to main content
10.5555/243846.243854acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article
Free Access

Trace cache: a low latency approach to high bandwidth instruction fetching

Published:02 December 1996Publication History

ABSTRACT

As the issue width of superscalar processors is increased, instruction fetch bandwidth requirements will also increase. It will become necessary to fetch multiple basic blocks per cycle. Conventional instruction caches hinder this effort because long instruction sequences are not always in contiguous cache locations. We propose supplementing the conventional instruction cache with a trace cache. This structure caches traces of the dynamic instruction stream, so instructions that are otherwise noncontiguous appear contiguous. For the Instruction Benchmark Suite (IBS) and SPEC92 integer benchmarks, a 4 kilobyte trace cache improves performance on average by 28% over conventional sequential fetching. Further, it is shown that the trace cache's efficient, low latency approach enables it to outperform more complex mechanisms that work solely out of the instruction cache.

References

  1. 1.T Conte, K. Menezes, P. Mills, and B. Patel. Optimization of instruction fetch mechanisms for high issue rates. 22nd Intl. Syrup. on Computer Architecture, pp. 333-344, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.S. Dutta and M. Franklin. Control flow prediction with treelike subgraphs for superscalar processors. 28th Intl. Symp. on Microarchitecture, pp. 258-263, Nov 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.M. Franklin and M. Smotherman. A fill-unit approach to multiple instruction issue. 27th Intl. Syrup. on Microarchitecture, pp. 162-171,Nov 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.G. F. Grohoski. Machine organization of the ibm rs/6000 processor, iBM Journal of R&D, 34(1):37-58, Jan 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.N. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. 17th Intl. Symp. on Computer Architecture, pp. 364- 373, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.D. Kaeli and P. Emma. Branch history table prediction of moving target branches due to subroutine returns. 18th Intl. Syrup. on Computer Architecture, pp. 34-42, May 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.J. Larus. Efficient program tracing. IEEE Computer, 26(5):52-61, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.J. Lee and A. J. Smith. Branch prediction strategies and branch target buffer design. IEEE Computer, 21(7):6-22, Jan 1984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.J. Losq. Generalized history table for branch prediction. IBM Technical Disclosure Bulletin, 25(1 ):99-101, June 1982.Google ScholarGoogle Scholar
  10. 10.S. Melvin, M. Shebanow, and Y. Patt. Hardware support for large atomic units in dynamically scheduled machines. 21st intl. Syrup. on Microarchitecture, pp. 60-66, Dec 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.S.-T. Pan, K. So, and J. T. Rahmeh. improving the accuracy of dynamic branch prediction using branch correlation. 5th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 76-84, Oct 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.E. Rotenberg, S. Bennett, and J. Smith. Trace cache: a low latency approach to high bandwidth instruction fetching. Tech Report 1310, CS Dept., Univ. ofWisc. - Madison, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.J. E. Smith. A study of branch prediction strategies. 8th Symp. on Computer Architecture, pp. 135-148, May 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.R. Uhlig, D. Nagle, T. Mudge, S. Sechrest, and J. Emer. Instruction fetching: Coping with code bloat. 22nd Intl. Syrup. on Computer Architecture, pp. 345-356, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.T-Y. Yeh. Two-level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors. PhD thesis, EECS Dept., University of Michigan - Ann Arbor, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.T.-Y. Yeh, D. T Marr, and Y. N. Patt. Increasing the instruction fetch rate via multiple branch prediction and a branch address cache. 7th Intl. Conf. on Supercomputing, pp. 67- 76, July 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.T.-Y. Yeh and Y. N. Patt. A comprehensive instruction fetch mechanism for a processor supporting speculative execution. 25th Intl. Syrup. on Microarchitecture, pp. 129-139, Dec 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Trace cache: a low latency approach to high bandwidth instruction fetching

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader