Next cache line and set prediction

Authors:
Brad Calder

Department of Computer Science, Campus Box 430, University of Colorado, Boulder, CO

Department of Computer Science, Campus Box 430, University of Colorado, Boulder, CO
View Profile

,
Dirk Grunwald

Department of Computer Science, Campus Box 430, University of Colorado, Boulder, CO

Department of Computer Science, Campus Box 430, University of Colorado, Boulder, CO
View Profile

ISCA '95: Proceedings of the 22nd annual international symposium on Computer architectureJuly 1995Pages 287–296https://doi.org/10.1145/223982.224439

Published:01 May 1995Publication History

ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture

Pages 287–296

ABSTRACT

Accurate instruction fetch and branch prediction is increasingly important on today's wide-issue architectures. Fetch prediction is the process of determining the next instruction to request from the memory subsystem. Branch prediction is the process of predicting the likely out-come of branch instructions. Several researchers have proposed very effective fetch and branch prediction mechanisms including branch target buffers (BTB) that store the target addresses of taken branches. An alternative approach fetches the instruction following a branch by using an index into the cache instead of a branch target address. We call such an index a next cache line and set (NLS) predictor. A NLS predictor is a pointer into the instruction cache, indicating the target instruction of a branch.In this paper we examine the use of NLS predictors for efficient and accurate fetch and branch prediction. Previous studies associated each NLS predictor with a cache line and provided only one-bit conditional branch predictors. Our study examines the use of NLS predictors with highly accurate two-level correlated conditional branch architectures. We examine the performance of decoupling the NLS predictors from the cache line and storing them in a separate tag-less memory buffer. Our results show that the decoupled architecture performs better than associating the NLS predictors with the cache line, that the NLS architecture benefits from reduced cache miss rates, and it is particularly effective for programs containing many branches. We also provide an in-depth comparison between the NLS and BTB architectures, showing that the NLS architecture is a competitive alternative to the BTB design.

References

1.Brian Bray and M.J. Flynn. Strategies for branch target buffers. in 24th Annual International Symposium and Workshop on Microprogramming, pages 42-49. ACM, 1991. Google ScholarDigital Library
2.Brad Calder and Dirk Grunwald. Fast & accurate instruction fetch and branch prediction. In 21 stAnnual International Symposium of Computer Architecture, pages 2-11. ACM, April 1994. Google ScholarDigital Library
3.Peter Yan-Tek Hsu. Designing the TFP microprocessor. IEEE Micro, 14(2):23-33, April 1994. Google ScholarDigital Library
4.Wen-mei W. Hwu and Pohua P. Chang. Achieving high instruction cache performance with an optimizing compiler. In 16th Annual International Symposium on Computer Architecture, pages 242-251. ACM, 1989. Google ScholarDigital Library
5.Mike Johnson. Superscalar Microprocessor Design. Innovative Technology. Prentice-Hall. Inc., Englewood Cliffs, NJ, 1991.Google Scholar
6.David R. Kaeli and Philip G. Emma. Branch history table prediction of moving target branches due to subroutine retums. In 18th Annual International Symposium of Computer Architecture, pages 34-42. ACM, May 1991. Google ScholarDigital Library
7.Johnny K. F. Lee and Alan Jay Smith. Branch prediction strategies and branch target buffer design. IEEE Computer, pages 6-22, January 1984.Google Scholar
8.Scott McFading. Program optimization for instruction caches. In Proceedings of the 3rd Symposium on Architectural Support for Programming Languages and Operating Systems, pages 183-191. ACM, 1988. Google ScholarDigital Library
9.Scott McFading. Combining branch predictors. TN 36, DEC- WRL, June 1993.Google Scholar
10.Scott McFading and John Hennessy. Reducing the cost of branches. In 13th Annual International Symposium of Computer Architecture, pages 396-403. ACM, 1986. Google ScholarDigital Library
11.Johannes M. Mulder, Nhon T. Quach, and Michael J. Flynn. An area model for on-chip memories and its application. IEEE Journal of Solid-State Circuits, 26(2):98-105, February 1991.Google ScholarCross Ref
12.S.-T. Pan, K. So, and J. T. Rahmeh. Improving the accuracy of dynamic branch prediction using branch correlation. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 76- 84, Boston, Mass., October 1992. ACM. Google ScholarDigital Library
13.Chris Perleberg and Alan Jay Smith. Branch target buffer design and optimization. IEEE Transactions on Computers, 42(4):396-4 12, April 1993. Google ScholarDigital Library
14.Karl Pettis and Robert C. Hansen. Profile guided code positioning. In Proceedings of the ACM SIGPLAN '90 Conference on Programming Language Design and Implementation, pages 16-27. ACM, June 1990. Google ScholarDigital Library
15.J. E. Smith. A study of branch prediction strategies. In 8th Annual International Symposium of Computer Architecture, pages 135-148. ACM, 1981. Google ScholarDigital Library
16.S. Peter Song, Marvin Denman, and Joe Chang. The PowerPC 604 RISC microprocessor. IEEE Micro, 14(5):8-17, October 1994. Google ScholarDigital Library
17.Amitabh Srivastava and Alan Eustace. ATOM: A system for building customized program analysis tools. In 1994 Programming Language Design and Implementation, pages 196-205. ACM, June 1994. Google ScholarDigital Library
18.Simon C. Steely and David J. Sager. Next line prediction apparatus for a pipelined computer system. US. Patent #5,283,873, Feb. 1994.Google Scholar
19.Steven J. E. Wilton and Norman P. Jouppi. An enhanced access and cycle time model for on-chip caches. WRL Report 93/5, DEC Western Research Lab, 1993.Google Scholar
20.Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch predictions. In 19th Annual International Symposium of Computer Architecture, pages 124-134, Gold Coast, Australia, May 1992. ACM. Google ScholarDigital Library
21.Tse-Yu Ueh and Yale/q. Patt. A comprehenslve lnstructlon fetch mechanism for a processor supporting speculative execution. In 25th Annual International Symposium on Microarchitecture, pages 129-139, Portland, Or, December 1992. ACM. Google ScholarDigital Library
22.Tse-Yu Yeh and Yale N. Patt. A comparison of dynamic branch predictors that use two levels of branch history. In 20th Annual International Symposium on Computer Architecture, pages 257-266, San Diego, CA, May 1993. ACM. Google ScholarDigital Library

Index Terms

Next cache line and set prediction
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
      2. Very long instruction word
    2. Serial architectures
      1. Complex instruction set computing
      2. Reduced instruction set computing
2. General and reference
  1. Cross-computing tools and techniques
    1. Performance

Recommendations

High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10

Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
Read More
Next cache line and set prediction
Special Issue: Proceedings of the 22nd annual international symposium on Computer architecture (ISCA '95)

Accurate instruction fetch and branch prediction is increasingly important on today's wide-issue architectures. Fetch prediction is the process of determining the next instruction to request from the memory subsystem. Branch prediction is the process of ...
Read More
Cache Noise Prediction

Caches are very inefficiently utilized because not all the excess data brought into the cache, to exploit spatial locality, is utilized. Our experiments showed that Level 1 data cache has a utilization of only about 57%. Increasing the efficiency of the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture
July 1995
426 pages
ISBN:0897916980
DOI:10.1145/223982
Chairman:
David A. Patterson
Univ. of California, Berkeley
ACM SIGARCH Computer Architecture News Volume 23, Issue 2
Special Issue: Proceedings of the 22nd annual international symposium on Computer architecture (ISCA '95)
May 1995
412 pages
ISSN:0163-5964
DOI:10.1145/225830
Chairman:
David A. Patterson
Univ. of California, Berkeley
Issue’s Table of Contents
Copyright © 1995 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 1995
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 65
  Total Citations
  View Citations
- 941
  Total Downloads
- Downloads (Last 12 months)94
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Next cache line and set prediction

ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

High performance cache replacement using re-reference interval prediction (RRIP)

Next cache line and set prediction

Cache Noise Prediction