ABSTRACT
Accurate instruction fetch and branch prediction is increasingly important on today's wide-issue architectures. Fetch prediction is the process of determining the next instruction to request from the memory subsystem. Branch prediction is the process of predicting the likely out-come of branch instructions. Several researchers have proposed very effective fetch and branch prediction mechanisms including branch target buffers (BTB) that store the target addresses of taken branches. An alternative approach fetches the instruction following a branch by using an index into the cache instead of a branch target address. We call such an index a next cache line and set (NLS) predictor. A NLS predictor is a pointer into the instruction cache, indicating the target instruction of a branch.In this paper we examine the use of NLS predictors for efficient and accurate fetch and branch prediction. Previous studies associated each NLS predictor with a cache line and provided only one-bit conditional branch predictors. Our study examines the use of NLS predictors with highly accurate two-level correlated conditional branch architectures. We examine the performance of decoupling the NLS predictors from the cache line and storing them in a separate tag-less memory buffer. Our results show that the decoupled architecture performs better than associating the NLS predictors with the cache line, that the NLS architecture benefits from reduced cache miss rates, and it is particularly effective for programs containing many branches. We also provide an in-depth comparison between the NLS and BTB architectures, showing that the NLS architecture is a competitive alternative to the BTB design.
- 1.Brian Bray and M.J. Flynn. Strategies for branch target buffers. in 24th Annual International Symposium and Workshop on Microprogramming, pages 42-49. ACM, 1991. Google ScholarDigital Library
- 2.Brad Calder and Dirk Grunwald. Fast & accurate instruction fetch and branch prediction. In 21 stAnnual International Symposium of Computer Architecture, pages 2-11. ACM, April 1994. Google ScholarDigital Library
- 3.Peter Yan-Tek Hsu. Designing the TFP microprocessor. IEEE Micro, 14(2):23-33, April 1994. Google ScholarDigital Library
- 4.Wen-mei W. Hwu and Pohua P. Chang. Achieving high instruction cache performance with an optimizing compiler. In 16th Annual International Symposium on Computer Architecture, pages 242-251. ACM, 1989. Google ScholarDigital Library
- 5.Mike Johnson. Superscalar Microprocessor Design. Innovative Technology. Prentice-Hall. Inc., Englewood Cliffs, NJ, 1991.Google Scholar
- 6.David R. Kaeli and Philip G. Emma. Branch history table prediction of moving target branches due to subroutine retums. In 18th Annual International Symposium of Computer Architecture, pages 34-42. ACM, May 1991. Google ScholarDigital Library
- 7.Johnny K. F. Lee and Alan Jay Smith. Branch prediction strategies and branch target buffer design. IEEE Computer, pages 6-22, January 1984.Google Scholar
- 8.Scott McFading. Program optimization for instruction caches. In Proceedings of the 3rd Symposium on Architectural Support for Programming Languages and Operating Systems, pages 183-191. ACM, 1988. Google ScholarDigital Library
- 9.Scott McFading. Combining branch predictors. TN 36, DEC- WRL, June 1993.Google Scholar
- 10.Scott McFading and John Hennessy. Reducing the cost of branches. In 13th Annual International Symposium of Computer Architecture, pages 396-403. ACM, 1986. Google ScholarDigital Library
- 11.Johannes M. Mulder, Nhon T. Quach, and Michael J. Flynn. An area model for on-chip memories and its application. IEEE Journal of Solid-State Circuits, 26(2):98-105, February 1991.Google ScholarCross Ref
- 12.S.-T. Pan, K. So, and J. T. Rahmeh. Improving the accuracy of dynamic branch prediction using branch correlation. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 76- 84, Boston, Mass., October 1992. ACM. Google ScholarDigital Library
- 13.Chris Perleberg and Alan Jay Smith. Branch target buffer design and optimization. IEEE Transactions on Computers, 42(4):396-4 12, April 1993. Google ScholarDigital Library
- 14.Karl Pettis and Robert C. Hansen. Profile guided code positioning. In Proceedings of the ACM SIGPLAN '90 Conference on Programming Language Design and Implementation, pages 16-27. ACM, June 1990. Google ScholarDigital Library
- 15.J. E. Smith. A study of branch prediction strategies. In 8th Annual International Symposium of Computer Architecture, pages 135-148. ACM, 1981. Google ScholarDigital Library
- 16.S. Peter Song, Marvin Denman, and Joe Chang. The PowerPC 604 RISC microprocessor. IEEE Micro, 14(5):8-17, October 1994. Google ScholarDigital Library
- 17.Amitabh Srivastava and Alan Eustace. ATOM: A system for building customized program analysis tools. In 1994 Programming Language Design and Implementation, pages 196-205. ACM, June 1994. Google ScholarDigital Library
- 18.Simon C. Steely and David J. Sager. Next line prediction apparatus for a pipelined computer system. US. Patent #5,283,873, Feb. 1994.Google Scholar
- 19.Steven J. E. Wilton and Norman P. Jouppi. An enhanced access and cycle time model for on-chip caches. WRL Report 93/5, DEC Western Research Lab, 1993.Google Scholar
- 20.Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch predictions. In 19th Annual International Symposium of Computer Architecture, pages 124-134, Gold Coast, Australia, May 1992. ACM. Google ScholarDigital Library
- 21.Tse-Yu Ueh and Yale/q. Patt. A comprehenslve lnstructlon fetch mechanism for a processor supporting speculative execution. In 25th Annual International Symposium on Microarchitecture, pages 129-139, Portland, Or, December 1992. ACM. Google ScholarDigital Library
- 22.Tse-Yu Yeh and Yale N. Patt. A comparison of dynamic branch predictors that use two levels of branch history. In 20th Annual International Symposium on Computer Architecture, pages 257-266, San Diego, CA, May 1993. ACM. Google ScholarDigital Library
Index Terms
- Next cache line and set prediction
Recommendations
High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
Next cache line and set prediction
Special Issue: Proceedings of the 22nd annual international symposium on Computer architecture (ISCA '95)Accurate instruction fetch and branch prediction is increasingly important on today's wide-issue architectures. Fetch prediction is the process of determining the next instruction to request from the memory subsystem. Branch prediction is the process of ...
Cache Noise Prediction
Caches are very inefficiently utilized because not all the excess data brought into the cache, to exploit spatial locality, is utilized. Our experiments showed that Level 1 data cache has a utilization of only about 57%. Increasing the efficiency of the ...
Comments