Article

Optimizing pipelines for power and performance

Authors:
Viji Srinivasan

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

,
David Brooks

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Michael Gschwind

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Pradip Bose

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Victor Zyuban

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Philip N. Strenski

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Philip G. Emma

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on MicroarchitectureNovember 2002Pages 333–344

Published:18 November 2002Publication History

MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture

Pages 333–344

ABSTRACT

During the concept phase and definition of next generation high-end processors, power and performance will need to be weighted appropriately to deliver competitive cost/performance. It is not enough to adopt a CPl-centric view alone in early-stage definition studies. One of the fundamental issues confronting the architect at this stage is the choice of pipeline depth and target frequency. In this paper we present an optimization methodology that starts with an analytical power-performance model to derive optimal pipeline depth for a superscalar processor. The results are validated and further refined using detailed simulation based analysis. As part of the power-modeling methodology, we have developed equations that model the variation of energy as a function of pipeline depth. Our results using a set of SPEC2000 applications show that when both power and performance are considered for optimization, the optimal clock period is around 18 F04. We also provide a detailed sensitivity analysis of the optimal pipeline depth against key assumptions of these energy models.

References

D. Brooks et al. Power-aware Microarchitecture: Design and Modeling Challenges for the next-generation microprocessors. IEEE Micro, 20(6):26--44, Nov./Dec. 2000.]] Google ScholarDigital Library
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA-27), June 2000.]] Google ScholarDigital Library
D. Brooks, J.-D. Wellman, P. Bose, and M. Martonosi. Power-Performance Modeling and Tradeoff Analysis for a High-End Microprocessor. In Power Aware Computing Systems Workshop at ASPLOS-IX, Nov. 2000.]] Google ScholarDigital Library
M. Brown, J. Stark, and Y. Patt. Select-free instruction scheduling logic. In Proceedings of the 34th International Symposium on Microarchitecture (MICRO-34), pages 204--213, December 2001.]] Google ScholarDigital Library
P. Dubey and M. Flynn. Optimal pipelining. J. Parallel and Distributed Computing, 8:10--19, 1990.]] Google ScholarDigital Library
P. G. Emma and E. S. Davidson. Characterization of branch and data dependencies in programs for evaluating pipeline performance. IEEE Transactions on Computers, C-36(7):859--875, 1987.]] Google ScholarDigital Library
M. J. Flynn, P. Hung, and K. Rudd. Deep-Submicron Microprocessor Design Issues. IEEE Micro, 19(4):11--22, July/Aug. 1999.]] Google ScholarDigital Library
R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits, 31(9): 1277--84, Sept. 1996.]]Google ScholarCross Ref
A. Hartstein and T. R. Puzak. The optimum pipeline depth for a microprocessor. In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-29), May 2002.]] Google ScholarDigital Library
S. Heo, R. Krashinsky, and K. Asanovic. Activity-sensitive flip-flop and latch selection for reduce energy. In 19th Conference on Advanced Research in VILSI, March 2001.]] Google ScholarDigital Library
M. Hrishikesh, K. Farkas, N. Jouppi, D. Burger, S. Keckler, and P. Sivakumar. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-29), pages 14--24, May 2002.]] Google ScholarDigital Library
V. lyengar, L. H. Trevillyan, and P. Bose. Representative traces for processor models with infinite cache. In Proc. 2nd. Symposium on High Performance Computer Architecture (HPCA-2), Feb. 1996.]] Google ScholarDigital Library
R. Jessani and C. Olson. The floating-point unit of the PowerPC 603e microprocessor. IBM J. of Research and Development, 40(5):559--566, Sept. 1996.]] Google ScholarDigital Library
P. Kogge. The Architecture of Pipelined Computers. Hemisphere Publishing Corporation, 1981.]] Google ScholarDigital Library
S. R. Kunkel and J. E. Smith. Optimal pipelining in supercomputers. In Proceedings of the 13th International Symposium on Computer Architecture (ISCA-13), pages 404--411, June 1986.]] Google ScholarDigital Library
M. Moudgill, P. Bose, and J. Moreno. Validation of Turandot, a fast processor model for microarchitecture exploration. In Proceedings of the IEEE International Performance, Computing, and Communications Conference (IPCCC), pages 451--457, Feb. 1999.]]Google ScholarCross Ref
M. Moudgill, J. Wellman, and J. Moreno. Environment for PowerPC microarchitecture exploration. IEEE Micro, 19(3):9--14, May/June 1999.]] Google ScholarDigital Library
J. S. Neely, H. H. Chen, S. G. Walker, J. Venuto, and T. Bucelot. CPAM: A common power analysis methodology for high-performance VLSI design. In Proc. of the 9th Topical Meeting on the Electrical Performance of Electronic Packaging, pages 303--306, 2000.]]Google ScholarCross Ref
S. Palacharla, N. Jouppi, and J. Smith. Complexity-Effective Superscalar Processors. In Proceedings of the 24th International Symposium on Computer Architecture (ISCA-24), 1997.]] Google ScholarDigital Library
P. Song and G. D. Micheli. Circuit and architecture tradeoffs for high-speed multiplication. IEEE Journal of Solid-State Circuits, 26(9): 1184--1198, Sept. 1991.]]Google ScholarCross Ref
E. Sprangle and D. Carmean. Increasing processor performance by implementing deeper pipelines. In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-29), May 2002.]] Google ScholarDigital Library
J. Stark, M. Brown, and Y. Patt. On pipelining dynamic instruction scheduling logic. In Proceedings of the 33rd International Symposium on Microarchitecture (MICRO-33), pages 57--66, Dec. 2000.]] Google ScholarDigital Library
N. Vijaykrishnan, M. Kandemir, M. Irwin, H. Kim, and W. Ye. Energy-driven integrated hardware-software optimizations using SimplePower. In Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000.]] Google ScholarDigital Library
V. Zyuban. Inherently Lower Power High Performance Superscalar Architectures. PhD thesis, University of Notre Dame, March 2000.]] Google ScholarDigital Library
V. Zyuban and D. Meltzer. Clocking strategies and scannable latches for low power applications. In Proc. of Int'l Symposium on Low-Power Electronics and Design, 2001.]] Google ScholarDigital Library
V. Zyuban and P. Strenski. Unified Methodology for Resolving Power-Performance Tradeoffs of the Microarchitectural and Circuit Levels. In Proc. of Int'l Symposium on Low-Power Electronics and Design, pages 166--171, 2002.]] Google ScholarDigital Library

Index Terms

Optimizing pipelines for power and performance

Recommendations

Power balanced pipelines
HPCA '12: Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture

Since the onset of pipelined processors, balancing the delay of the microarchitectural pipeline stages such that each microarchitectural pipeline stage has an equal delay has been a primary design objective, as it maximizes instruction throughput. ...
Read More
Custom Wide Counterflow Pipelines for High-Performance Embedded Applications

Abstract--Application-specific instruction set processor (ASIP) design is a promising technique to meet the performance and cost goals of high-performance systems. ASIPs are especially valuable for embedded computing applications (e.g., digital cameras, ...
Read More
Custom Wide Counterflow Pipelines for High-Performance Embedded Applications
PACT '00: Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques

Application-specific instruction set processor (ASIP) design is a promising technique to meet the performance and cost goals of high-performance systems. ASIPs are especially valuable for embedded computing (e.g., digital cameras, color printers, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
November 2002
442 pages
ISBN:0769518591
Conference Chair:
Erik Altman
IBM
,
General Chair:
Kemal Ebcioǧlu
IBM
,
Program Chairs:
Scott Mahlke
University of Michigan
,
B. Ramakrishna Rau
Hewlett-Packard Laboratories
,
Publications Chair:
Sanjay Patel
University of Illinois
Sponsors
In-Cooperation
Publisher
IEEE Computer Society Press
Washington, DC, United States
Publication History
- Published: 18 November 2002
Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate484of2,242submissions,22%
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 49
  Total Citations
  View Citations
- 997
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Optimizing pipelines for power and performance

MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Power balanced pipelines

Custom Wide Counterflow Pipelines for High-Performance Embedded Applications

Custom Wide Counterflow Pipelines for High-Performance Embedded Applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Optimizing pipelines for power and performance

MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Power balanced pipelines

Custom Wide Counterflow Pipelines for High-Performance Embedded Applications

Custom Wide Counterflow Pipelines for High-Performance Embedded Applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media