skip to main content
10.5555/774861.774897acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Optimizing pipelines for power and performance

Authors Info & Claims
Published:18 November 2002Publication History

ABSTRACT

During the concept phase and definition of next generation high-end processors, power and performance will need to be weighted appropriately to deliver competitive cost/performance. It is not enough to adopt a CPl-centric view alone in early-stage definition studies. One of the fundamental issues confronting the architect at this stage is the choice of pipeline depth and target frequency. In this paper we present an optimization methodology that starts with an analytical power-performance model to derive optimal pipeline depth for a superscalar processor. The results are validated and further refined using detailed simulation based analysis. As part of the power-modeling methodology, we have developed equations that model the variation of energy as a function of pipeline depth. Our results using a set of SPEC2000 applications show that when both power and performance are considered for optimization, the optimal clock period is around 18 F04. We also provide a detailed sensitivity analysis of the optimal pipeline depth against key assumptions of these energy models.

References

  1. D. Brooks et al. Power-aware Microarchitecture: Design and Modeling Challenges for the next-generation microprocessors. IEEE Micro, 20(6):26--44, Nov./Dec. 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA-27), June 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Brooks, J.-D. Wellman, P. Bose, and M. Martonosi. Power-Performance Modeling and Tradeoff Analysis for a High-End Microprocessor. In Power Aware Computing Systems Workshop at ASPLOS-IX, Nov. 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Brown, J. Stark, and Y. Patt. Select-free instruction scheduling logic. In Proceedings of the 34th International Symposium on Microarchitecture (MICRO-34), pages 204--213, December 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Dubey and M. Flynn. Optimal pipelining. J. Parallel and Distributed Computing, 8:10--19, 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. G. Emma and E. S. Davidson. Characterization of branch and data dependencies in programs for evaluating pipeline performance. IEEE Transactions on Computers, C-36(7):859--875, 1987.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. J. Flynn, P. Hung, and K. Rudd. Deep-Submicron Microprocessor Design Issues. IEEE Micro, 19(4):11--22, July/Aug. 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits, 31(9): 1277--84, Sept. 1996.]]Google ScholarGoogle ScholarCross RefCross Ref
  9. A. Hartstein and T. R. Puzak. The optimum pipeline depth for a microprocessor. In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-29), May 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Heo, R. Krashinsky, and K. Asanovic. Activity-sensitive flip-flop and latch selection for reduce energy. In 19th Conference on Advanced Research in VILSI, March 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Hrishikesh, K. Farkas, N. Jouppi, D. Burger, S. Keckler, and P. Sivakumar. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-29), pages 14--24, May 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. lyengar, L. H. Trevillyan, and P. Bose. Representative traces for processor models with infinite cache. In Proc. 2nd. Symposium on High Performance Computer Architecture (HPCA-2), Feb. 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Jessani and C. Olson. The floating-point unit of the PowerPC 603e microprocessor. IBM J. of Research and Development, 40(5):559--566, Sept. 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Kogge. The Architecture of Pipelined Computers. Hemisphere Publishing Corporation, 1981.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. R. Kunkel and J. E. Smith. Optimal pipelining in supercomputers. In Proceedings of the 13th International Symposium on Computer Architecture (ISCA-13), pages 404--411, June 1986.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Moudgill, P. Bose, and J. Moreno. Validation of Turandot, a fast processor model for microarchitecture exploration. In Proceedings of the IEEE International Performance, Computing, and Communications Conference (IPCCC), pages 451--457, Feb. 1999.]]Google ScholarGoogle ScholarCross RefCross Ref
  17. M. Moudgill, J. Wellman, and J. Moreno. Environment for PowerPC microarchitecture exploration. IEEE Micro, 19(3):9--14, May/June 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. S. Neely, H. H. Chen, S. G. Walker, J. Venuto, and T. Bucelot. CPAM: A common power analysis methodology for high-performance VLSI design. In Proc. of the 9th Topical Meeting on the Electrical Performance of Electronic Packaging, pages 303--306, 2000.]]Google ScholarGoogle ScholarCross RefCross Ref
  19. S. Palacharla, N. Jouppi, and J. Smith. Complexity-Effective Superscalar Processors. In Proceedings of the 24th International Symposium on Computer Architecture (ISCA-24), 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Song and G. D. Micheli. Circuit and architecture tradeoffs for high-speed multiplication. IEEE Journal of Solid-State Circuits, 26(9): 1184--1198, Sept. 1991.]]Google ScholarGoogle ScholarCross RefCross Ref
  21. E. Sprangle and D. Carmean. Increasing processor performance by implementing deeper pipelines. In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-29), May 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Stark, M. Brown, and Y. Patt. On pipelining dynamic instruction scheduling logic. In Proceedings of the 33rd International Symposium on Microarchitecture (MICRO-33), pages 57--66, Dec. 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Vijaykrishnan, M. Kandemir, M. Irwin, H. Kim, and W. Ye. Energy-driven integrated hardware-software optimizations using SimplePower. In Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. V. Zyuban. Inherently Lower Power High Performance Superscalar Architectures. PhD thesis, University of Notre Dame, March 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. V. Zyuban and D. Meltzer. Clocking strategies and scannable latches for low power applications. In Proc. of Int'l Symposium on Low-Power Electronics and Design, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. V. Zyuban and P. Strenski. Unified Methodology for Resolving Power-Performance Tradeoffs of the Microarchitectural and Circuit Levels. In Proc. of Int'l Symposium on Low-Power Electronics and Design, pages 166--171, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimizing pipelines for power and performance

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader