Article

Focusing processor policies via critical-path prediction

Authors:
Brian Fields

Computer Sciences Department, University of Wisconsin-Madison

Computer Sciences Department, University of Wisconsin-Madison
View Profile

,
Shai Rubin

Computer Sciences Department, University of Wisconsin-Madison

Computer Sciences Department, University of Wisconsin-Madison
View Profile

,
Rastislav Bodík

Computer Sciences Department, University of Wisconsin-Madison

Computer Sciences Department, University of Wisconsin-Madison
View Profile

ISCA '01: Proceedings of the 28th annual international symposium on Computer architectureJune 2001Pages 74–85https://doi.org/10.1145/379240.379253

Published:01 May 2001Publication History

ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture

Pages 74–85

ABSTRACT

Although some instructions hurt performance more than others, current processors typically apply scheduling and speculation as if each instruction was equally costly. Instruction cost can be naturally expressed through the critical path: if we could predict it at run-time, egalitarian policies could be replaced with cost-sensitive strategies that will grow increasingly effective as processors become more parallel.

This paper introduces a hardware predictor of instruction criticality and uses it to improve performance. The predictor is both effective and simple in its hardware implementation. The effectiveness at improving performance stems from using a dependence-graph model of the microarchitectural critical path that identifies execution bottlenecks by incorporating both data and machine-specific dependences. The simplicity stems from a token-passing algorithm that computes the critical path without actually building the dependence graph.

By focusing processor policies on critical instructions, our predictor enables a large class of optimizations. It can (i) give priority to critical instructions for scarce resources (functional units, ports, predictor entries); and (ii) suppress speculation on non-critical instructions, thus reducing “useless” misspeculations. We present two case studies that illustrate the potential of the two types of optimization, we show that (i) critical-path-based dynamic instruction scheduling and steering in a clustered architecture improves performance by as much as 21% (10% on average); and (ii) focusing value prediction only on critical instructions improves performance by as much as 5%, due to removing nearly half of the misspeculations.

References

1.V. Agarwal, M.S. Hrishikesh, S. Keckler, and D. Burger. Clock rate versus IPC: The end of the road for conventional microarchitectures. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA '00), Vancouver, June 10-14 2000. Google ScholarDigital Library
2.A. Baniasadi and A. Moshovos. Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors. In Proceedings of the 33th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-O0), pages 337-347, December 10-13 2000. Google ScholarDigital Library
3.P. Barford and M. Crovella. Critical path analysis of TCP transactions. In Proceedings qf ACM SIGCOMM 2000, January 2000. Google ScholarDigital Library
4.D. C. Burger and T. M. Austin. The simplescalar tool set, version 2.0. Technical Report CS-TR-1997-1342, University of Wisconsin, Madison, June 1997.Google ScholarDigital Library
5.B. Calder, G. Reinman, and D. Tullsen. Selective value prediction. In Proceedings of the 26th Annual International Symposium on Computer Architecture (1SCA '99), pages 64-75, New York, N.Y., May 1-5 1999. Google ScholarDigital Library
6.K.I. Farkas, P. Chow, N. P. Jouppi, and Z. Vranesic. The multicluster architecture: Reducing cycle time through partitioning. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-97), pages 149-159, Los Alamitos, December 1-3 1997. Google ScholarDigital Library
7.B. R. Fisk and R. I. Bahar. The non-critical buffer: Using load latency tolerance to improve data cache efficiency. In IEEE International ConJerence on Computer Design, Austin, TX, 1999. Google ScholarDigital Library
8.L. Gwennap. Digital 21264 sets new standard. Microprocessor Report, 10:9-15, October 1996.Google Scholar
9.J. K. Hollingsworth. Critical path profiling of message passing and shared-memory programs. IEEE Transactions on Parallel and Distributed Systems, 9(10), October 1998. Google ScholarDigital Library
10.G. Kemp and M. Franklin. PEWs: A decentralized dynamic scheduling algorithm for ILP processing. In International Conference on Parallel Processing, pages 239-246, Aug 1996.Google Scholar
11.M. H. Lipasti and J. P. Shen. Exceeding the dataflow limit via value prediction. In Proceedings qfthe 29th Annual lnternational Symposium on Microarchitecture, pages 226-237, Paris, France, December 2-4, 1996. Google ScholarDigital Library
12.S. Palacharla, N. P. Jouppi, and J. E. Smith. Complexityeffective superscalar processors. In 24th Annual International Symposium on Computer Architecture, pages 206-218, 1997. Google ScholarDigital Library
13.E. Rotenberg, Q. Jacobson, Y. Sazeides, and J. Smith. Trace processors. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-97), pages 138-148, Los Alamitos, December 1-3 1997. Google ScholarDigital Library
14.A. Roth and GS. Sohi. Speculative data-driven muhithreading. In Proceedings of the Seventh International Symposium on High-PerJormance Computer Architecture, Jan 2001. Google ScholarDigital Library
15.M. Schlansker, V. Kathail, and S. Anik. Height reduction of control recurrences for ILP processors. In Proceedings of the 27th Annual International Symposium on Microarchitecture, pages 40-51, San Jose, California, November 30-December 2, 1994. Google ScholarDigital Library
16.M. Schlansker, S. Mahlke, and R. Johnson. Control CPR: A branch height reduction optimization for EPIC architectures. In Proceedings of the ACM SIGPLAN '99 Conference on Programming Language Design and Implementation, pages 155- 168, 1999. Google ScholarDigital Library
17.S. T. Srinivasan, R. Dz ching Ju, A. R. Lebeck, and C. Wilkerson. Locality vs. criticality. In Proceedings of the 28th Annual International Symposium on Computer Architecture, June 2001. Google ScholarDigital Library
18.S. T. Srinivasan and A. R. Lebeck. Load latency tolerance in dynamically scheduled processors. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture (MICRO-98), pages 148-159, Los Alamitos, November 30-December 2 1998. Google ScholarDigital Library
19.J. Stark, P. Racunas, and Y. N. Patt. Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-97), pages 34-45, Los Alamitos, December 1-3 1997. Google ScholarDigital Library
20.D. Tullsen and B. Calder. Computing along the critical path. Technical report, University of California, San Diego, Oct 1998.Google Scholar
21.E. Tune, D. Liang, D. M. Tullsen, and B. Calder. Dynamic prediction of critical path instructions. In Proceedings of the Seventh International Symposium on High-Performance Computer Architecture, Jan 2001. Google ScholarDigital Library
22.K. Wang and M. Franklin. Highly accurate data value prediction using hybrid predictors. In Proceedings f'the 30th Annual IEEE/ACM International S),mposium on Microarchitecture (MICRO-97), pages 281-291, Los Alamitos, December 1- 3 1997. Google ScholarDigital Library
23.A. Yoaz, M. Erez, R. Ronen, and S. Jourdan. Speculative techniques for improving load related instruction scheduling. In Proceedings (of the 26th Annual International Symposium on Computer Architecture, June 1999. Google ScholarDigital Library

Index Terms

Recommendations

Focusing processor policies via critical-path prediction
Special Issue: Proceedings of the 28th annual international symposium on Computer architecture (ISCA '01)

Although some instructions hurt performance more than others, current processors typically apply scheduling and speculation as if each instruction was equally costly. Instruction cost can be naturally expressed through the critical path: if we could ...
Read More
Critical-path aware processor architectures
Read More
Difficult-path branch prediction using subordinate microthreads
Special Issue: Proceedings of the 29th annual international symposium on Computer architecture (ISCA '02)

Branch misprediction penalties continue to increase as microprocessor cores become wider and deeper. Thus, improving branch prediction accuracy remains an important challenge. Simultaneous Subordinate Microthreading (SSMT) provides a means to improve ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture
June 2001
289 pages
ISBN:0769511627
DOI:10.1145/379240
Chairman:
Per Stenström
Chalmers Univ. of Technology
ACM SIGARCH Computer Architecture News Volume 29, Issue 2
Special Issue: Proceedings of the 28th annual international symposium on Computer architecture (ISCA '01)
May 2001
262 pages
ISSN:0163-5964
DOI:10.1145/384285
Editor:
Per Stenström
Chalmers Univ. of Technology
Issue’s Table of Contents
Copyright © 2001 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
ISCA '01 Paper Acceptance Rate24of163submissions,15%Overall Acceptance Rate543of3,203submissions,17%
More
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 227
  Total Citations
  View Citations
- 1,737
  Total Downloads
- Downloads (Last 12 months)87
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Focusing processor policies via critical-path prediction

ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Focusing processor policies via critical-path prediction

Critical-path aware processor architectures

Difficult-path branch prediction using subordinate microthreads