ABSTRACT
During the concept phase and definition of next generation high-end processors, power and performance will need to be weighted appropriately to deliver competitive cost/performance. It is not enough to adopt a CPl-centric view alone in early-stage definition studies. One of the fundamental issues confronting the architect at this stage is the choice of pipeline depth and target frequency. In this paper we present an optimization methodology that starts with an analytical power-performance model to derive optimal pipeline depth for a superscalar processor. The results are validated and further refined using detailed simulation based analysis. As part of the power-modeling methodology, we have developed equations that model the variation of energy as a function of pipeline depth. Our results using a set of SPEC2000 applications show that when both power and performance are considered for optimization, the optimal clock period is around 18 F04. We also provide a detailed sensitivity analysis of the optimal pipeline depth against key assumptions of these energy models.
- D. Brooks et al. Power-aware Microarchitecture: Design and Modeling Challenges for the next-generation microprocessors. IEEE Micro, 20(6):26--44, Nov./Dec. 2000.]] Google ScholarDigital Library
- D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA-27), June 2000.]] Google ScholarDigital Library
- D. Brooks, J.-D. Wellman, P. Bose, and M. Martonosi. Power-Performance Modeling and Tradeoff Analysis for a High-End Microprocessor. In Power Aware Computing Systems Workshop at ASPLOS-IX, Nov. 2000.]] Google ScholarDigital Library
- M. Brown, J. Stark, and Y. Patt. Select-free instruction scheduling logic. In Proceedings of the 34th International Symposium on Microarchitecture (MICRO-34), pages 204--213, December 2001.]] Google ScholarDigital Library
- P. Dubey and M. Flynn. Optimal pipelining. J. Parallel and Distributed Computing, 8:10--19, 1990.]] Google ScholarDigital Library
- P. G. Emma and E. S. Davidson. Characterization of branch and data dependencies in programs for evaluating pipeline performance. IEEE Transactions on Computers, C-36(7):859--875, 1987.]] Google ScholarDigital Library
- M. J. Flynn, P. Hung, and K. Rudd. Deep-Submicron Microprocessor Design Issues. IEEE Micro, 19(4):11--22, July/Aug. 1999.]] Google ScholarDigital Library
- R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits, 31(9): 1277--84, Sept. 1996.]]Google ScholarCross Ref
- A. Hartstein and T. R. Puzak. The optimum pipeline depth for a microprocessor. In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-29), May 2002.]] Google ScholarDigital Library
- S. Heo, R. Krashinsky, and K. Asanovic. Activity-sensitive flip-flop and latch selection for reduce energy. In 19th Conference on Advanced Research in VILSI, March 2001.]] Google ScholarDigital Library
- M. Hrishikesh, K. Farkas, N. Jouppi, D. Burger, S. Keckler, and P. Sivakumar. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-29), pages 14--24, May 2002.]] Google ScholarDigital Library
- V. lyengar, L. H. Trevillyan, and P. Bose. Representative traces for processor models with infinite cache. In Proc. 2nd. Symposium on High Performance Computer Architecture (HPCA-2), Feb. 1996.]] Google ScholarDigital Library
- R. Jessani and C. Olson. The floating-point unit of the PowerPC 603e microprocessor. IBM J. of Research and Development, 40(5):559--566, Sept. 1996.]] Google ScholarDigital Library
- P. Kogge. The Architecture of Pipelined Computers. Hemisphere Publishing Corporation, 1981.]] Google ScholarDigital Library
- S. R. Kunkel and J. E. Smith. Optimal pipelining in supercomputers. In Proceedings of the 13th International Symposium on Computer Architecture (ISCA-13), pages 404--411, June 1986.]] Google ScholarDigital Library
- M. Moudgill, P. Bose, and J. Moreno. Validation of Turandot, a fast processor model for microarchitecture exploration. In Proceedings of the IEEE International Performance, Computing, and Communications Conference (IPCCC), pages 451--457, Feb. 1999.]]Google ScholarCross Ref
- M. Moudgill, J. Wellman, and J. Moreno. Environment for PowerPC microarchitecture exploration. IEEE Micro, 19(3):9--14, May/June 1999.]] Google ScholarDigital Library
- J. S. Neely, H. H. Chen, S. G. Walker, J. Venuto, and T. Bucelot. CPAM: A common power analysis methodology for high-performance VLSI design. In Proc. of the 9th Topical Meeting on the Electrical Performance of Electronic Packaging, pages 303--306, 2000.]]Google ScholarCross Ref
- S. Palacharla, N. Jouppi, and J. Smith. Complexity-Effective Superscalar Processors. In Proceedings of the 24th International Symposium on Computer Architecture (ISCA-24), 1997.]] Google ScholarDigital Library
- P. Song and G. D. Micheli. Circuit and architecture tradeoffs for high-speed multiplication. IEEE Journal of Solid-State Circuits, 26(9): 1184--1198, Sept. 1991.]]Google ScholarCross Ref
- E. Sprangle and D. Carmean. Increasing processor performance by implementing deeper pipelines. In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-29), May 2002.]] Google ScholarDigital Library
- J. Stark, M. Brown, and Y. Patt. On pipelining dynamic instruction scheduling logic. In Proceedings of the 33rd International Symposium on Microarchitecture (MICRO-33), pages 57--66, Dec. 2000.]] Google ScholarDigital Library
- N. Vijaykrishnan, M. Kandemir, M. Irwin, H. Kim, and W. Ye. Energy-driven integrated hardware-software optimizations using SimplePower. In Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000.]] Google ScholarDigital Library
- V. Zyuban. Inherently Lower Power High Performance Superscalar Architectures. PhD thesis, University of Notre Dame, March 2000.]] Google ScholarDigital Library
- V. Zyuban and D. Meltzer. Clocking strategies and scannable latches for low power applications. In Proc. of Int'l Symposium on Low-Power Electronics and Design, 2001.]] Google ScholarDigital Library
- V. Zyuban and P. Strenski. Unified Methodology for Resolving Power-Performance Tradeoffs of the Microarchitectural and Circuit Levels. In Proc. of Int'l Symposium on Low-Power Electronics and Design, pages 166--171, 2002.]] Google ScholarDigital Library
Index Terms
- Optimizing pipelines for power and performance
Recommendations
Power balanced pipelines
HPCA '12: Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer ArchitectureSince the onset of pipelined processors, balancing the delay of the microarchitectural pipeline stages such that each microarchitectural pipeline stage has an equal delay has been a primary design objective, as it maximizes instruction throughput. ...
Custom Wide Counterflow Pipelines for High-Performance Embedded Applications
Abstract--Application-specific instruction set processor (ASIP) design is a promising technique to meet the performance and cost goals of high-performance systems. ASIPs are especially valuable for embedded computing applications (e.g., digital cameras, ...
Custom Wide Counterflow Pipelines for High-Performance Embedded Applications
PACT '00: Proceedings of the 2000 International Conference on Parallel Architectures and Compilation TechniquesApplication-specific instruction set processor (ASIP) design is a promising technique to meet the performance and cost goals of high-performance systems. ASIPs are especially valuable for embedded computing (e.g., digital cameras, color printers, ...
Comments