Article

Free Access

Trace cache: a low latency approach to high bandwidth instruction fetching

Authors:
Eric Rotenberg

Computer Science Dept., Univ. of Wisconsin - Madison

Computer Science Dept., Univ. of Wisconsin - Madison
View Profile

,
Steve Bennett

Intel Corporation

Intel Corporation
View Profile

,
James E. Smith

Dept. of Elec. and Comp. Engr., Univ. of Wisconsin - Madison

Dept. of Elec. and Comp. Engr., Univ. of Wisconsin - Madison
View Profile

Authors Info & Claims

MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on MicroarchitectureDecember 1996Pages 24–35

Published:02 December 1996Publication History

MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

Pages 24–35

ABSTRACT

As the issue width of superscalar processors is increased, instruction fetch bandwidth requirements will also increase. It will become necessary to fetch multiple basic blocks per cycle. Conventional instruction caches hinder this effort because long instruction sequences are not always in contiguous cache locations. We propose supplementing the conventional instruction cache with a trace cache. This structure caches traces of the dynamic instruction stream, so instructions that are otherwise noncontiguous appear contiguous. For the Instruction Benchmark Suite (IBS) and SPEC92 integer benchmarks, a 4 kilobyte trace cache improves performance on average by 28% over conventional sequential fetching. Further, it is shown that the trace cache's efficient, low latency approach enables it to outperform more complex mechanisms that work solely out of the instruction cache.

References

1.T Conte, K. Menezes, P. Mills, and B. Patel. Optimization of instruction fetch mechanisms for high issue rates. 22nd Intl. Syrup. on Computer Architecture, pp. 333-344, June 1995. Google ScholarDigital Library
2.S. Dutta and M. Franklin. Control flow prediction with treelike subgraphs for superscalar processors. 28th Intl. Symp. on Microarchitecture, pp. 258-263, Nov 1995. Google ScholarDigital Library
3.M. Franklin and M. Smotherman. A fill-unit approach to multiple instruction issue. 27th Intl. Syrup. on Microarchitecture, pp. 162-171,Nov 1994. Google ScholarDigital Library
4.G. F. Grohoski. Machine organization of the ibm rs/6000 processor, iBM Journal of R&D, 34(1):37-58, Jan 1990. Google ScholarDigital Library
5.N. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. 17th Intl. Symp. on Computer Architecture, pp. 364- 373, May 1990. Google ScholarDigital Library
6.D. Kaeli and P. Emma. Branch history table prediction of moving target branches due to subroutine returns. 18th Intl. Syrup. on Computer Architecture, pp. 34-42, May 1991. Google ScholarDigital Library
7.J. Larus. Efficient program tracing. IEEE Computer, 26(5):52-61, May 1993. Google ScholarDigital Library
8.J. Lee and A. J. Smith. Branch prediction strategies and branch target buffer design. IEEE Computer, 21(7):6-22, Jan 1984.Google ScholarDigital Library
9.J. Losq. Generalized history table for branch prediction. IBM Technical Disclosure Bulletin, 25(1 ):99-101, June 1982.Google Scholar
10.S. Melvin, M. Shebanow, and Y. Patt. Hardware support for large atomic units in dynamically scheduled machines. 21st intl. Syrup. on Microarchitecture, pp. 60-66, Dec 1988. Google ScholarDigital Library
11.S.-T. Pan, K. So, and J. T. Rahmeh. improving the accuracy of dynamic branch prediction using branch correlation. 5th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 76-84, Oct 1992. Google ScholarDigital Library
12.E. Rotenberg, S. Bennett, and J. Smith. Trace cache: a low latency approach to high bandwidth instruction fetching. Tech Report 1310, CS Dept., Univ. ofWisc. - Madison, 1996. Google ScholarDigital Library
13.J. E. Smith. A study of branch prediction strategies. 8th Symp. on Computer Architecture, pp. 135-148, May 1981. Google ScholarDigital Library
14.R. Uhlig, D. Nagle, T. Mudge, S. Sechrest, and J. Emer. Instruction fetching: Coping with code bloat. 22nd Intl. Syrup. on Computer Architecture, pp. 345-356, June 1995. Google ScholarDigital Library
15.T-Y. Yeh. Two-level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors. PhD thesis, EECS Dept., University of Michigan - Ann Arbor, 1993. Google ScholarDigital Library
16.T.-Y. Yeh, D. T Marr, and Y. N. Patt. Increasing the instruction fetch rate via multiple branch prediction and a branch address cache. 7th Intl. Conf. on Supercomputing, pp. 67- 76, July 1993. Google ScholarDigital Library
17.T.-Y. Yeh and Y. N. Patt. A comprehensive instruction fetch mechanism for a processor supporting speculative execution. 25th Intl. Syrup. on Microarchitecture, pp. 129-139, Dec 1992. Google ScholarDigital Library

Index Terms

Trace cache: a low latency approach to high bandwidth instruction fetching
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Very long instruction word
    2. Serial architectures
      1. Complex instruction set computing
      2. Reduced instruction set computing
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

A Trace Cache Microarchitecture and Evaluation
Special issue on cache memory and related problems

As the instruction issue width of superscalar processors increases, instruction fetch bandwidth requirements will also increase. It will eventually become necessary to fetch multiple basic blocks per clock cycle. Conventional instruction caches hinder ...
Read More
The Effect of Program Optimization on Trace Cache Efficiency
PACT '99: Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques

Trace cache, an instruction fetch technique that reduces taken branch penalties by storing and fetching program instructions in dynamic execution order, dramatically improves instruction fetch bandwidth. Similarly, program transformations like loop ...
Read More
Trace Cache Miss Rate

Instruction fetch mechanism is a performance bottleneck of Superscalar and Simultaneous Multithreading Processors. A hardware mechanism, known as Trace Cache, is used in several processor architectures to improve instruction fetch performance. Most ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
December 1996
359 pages
ISBN:0818676418
Chairmen:
Stephen Melvin
Zytek Communications Corp.
,
Steve Beaty
Hewlett-Packard Corp.
Copyright © Copyright (c) 1996 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.
Sponsors
In-Cooperation
Publisher
IEEE Computer Society
United States
Publication History
- Published: 2 December 1996
Check for updates
Author Tags
instruction cache
instruction fetching
multiple branch prediction
superscalar processors
trace cache
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate484of2,242submissions,22%
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 163
  Total Citations
  View Citations
- 1,974
  Total Downloads
- Downloads (Last 12 months)57
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Trace cache: a low latency approach to high bandwidth instruction fetching

MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Trace Cache Microarchitecture and Evaluation

The Effect of Program Optimization on Trace Cache Efficiency

Trace Cache Miss Rate

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Trace cache: a low latency approach to high bandwidth instruction fetching

MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Trace Cache Microarchitecture and Evaluation

The Effect of Program Optimization on Trace Cache Efficiency

Trace Cache Miss Rate

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media