skip to main content
article

Power Awareness through Selective Dynamically Optimized Traces

Published:02 March 2004Publication History
Skip Abstract Section

Abstract

We present the PARROT concept that seeks to achievehigher performance with reduced energy consumptionthrough gradual optimization of frequently executed codetraces. The PARROT microarchitectural framework integratestrace caching, dynamic optimizations and pipelinedecoupling. We employ a selective approach for applyingcomplex mechanisms only upon the most frequently usedtraces to maximize the performance gain at any givenpower constraint, thus attaining finer control of tradeoffsbetween performance and power awareness.We show that the PARROT based microarchitecture canimprove the performance of aggressively designed processorsby providing the means to improve the utilizationof their more elaborate resources. At the same time, rigorousselection of traces prior to storage and optimizationprovides the key to attenuating increases in thepower budget.For resource-constrained designs, PARROT based architecturesdeliver better performance (up to an average16% increase in IPC) at a comparable energy level,whereas the conventional path to a similar performanceimprovement consumes an average 70% more energy.Meanwhile, for those designs which can tolerate a higherpower budget, PARROT gracefully scales up to use additionalexecution resources in a uniformly efficient manner.In particular, a PARROT-style doubly-wide machinedelivers an average 45% IPC improvement while actuallyimproving the cubic-MIPS-per-WATT power awarenessmetric by over 50%.

References

  1. {1} Y. Almog, R. Rosner, N. Schwartz and A. Schmorak, "Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture", in CGO'04, March 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. {2} V. Bala, E. Duesterwald and S. Banerjia, "Transparent Dynamic Optimization: The Design and Implementation of Dynamo", TR HPL-1999-78, HP Labs.Google ScholarGoogle Scholar
  3. {3} M. Bekerman, A. Mendelson and G Sheaffer, "Performance and Hardware Complexity Tradeoffs in Designing Multithreaded Architectures", in PACT, pp 24-34, Oct. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. {4} B. Black and J.P. Shen, "Turboscalar: A High Frequency High IPC Microarchitecture", in ISCA27, June 2000.Google ScholarGoogle Scholar
  5. {5} D.M. Brooks et al, "Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors", IEEE Micro, 20(6):36-44, Nov./Dec. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. {6} D. Brooks, V. Tiwari and M. Martonosi, "Wattch: a Framework for Architectural-level Power Analysis and Optimizations", in ISCA27, 83-94, June 2000, Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. {7} G. Cai, C.H. Lim and W.R. Daasch, "Thermal-Scheduling For Ultra Low Power Mobile Microprocessor", in Proc. WCED'02, 2002.Google ScholarGoogle Scholar
  8. {8} A. Dhodapkar, C. Lim, G. Cai and R. Daasch, "TEM2P2EST: A Thermal Enabled Multi-Model Power/Performance ESTimator", in PACS Workshop, held in conjunction with ASPLOS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. {9} K. Ebcioglu and E.R. Altman, "DAISY: Dynamic Compilation for 100% Architectural Compatibility", in ISCA24, pp. 26-37, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. {10} B. Fahs, S. Bose, M. Crum, B. Slechta, F. Spadini, T. Tung, S.J. Patel and S.S. Lumetta, "Performance Characterization of a Hard-ware Mechanism for Dynamic Optimization", MICRO34, Dec. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. {11} M Franklin and G.S. Sohi, "The Expandable Split Window Paradigm for Exploiting Fine-Grain Parallelism", in ISCA19, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. {12} D. Friendly, S. Patel and Y. Patt, "Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors", in MICRO31, Nov. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. {13} M. Gschwind, E.R. Altman, S. Sathaye, P. Ledak and D. Appenzeller, "Dynamic and Transparent Binary Translation", in IEEE Computer Magazine 33(3), pp. 54-59, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. {14} G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker and P. Roussel, "The Microarchitecture of the Pentium ® 4 Processor", in Intel Technology Journal, 2001.Google ScholarGoogle Scholar
  15. {15} Q. Jacobson, E. Rotenberg and J.E. Smith, "Path-Based Next Trace Prediction", in MICRO30, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. {16} S. Jourdan, L. Rappoport, Y. Almog, M. Erez, A. Yoaz, and R. Ronen, "eXtended Block Cache", in HPCA6, Jan. 2000.Google ScholarGoogle Scholar
  17. {17} O. Kosyakovsky, A. Mendelson and A. Kolodny, "The Use of Profile-based Trace Classification for Improving the Power and Performance of Trace Cache Systems", in 4th FDDO, Austin, Dec. 2001.Google ScholarGoogle Scholar
  18. {18} M.S. Lam and R.P. Wilson, "Limits of Control Flow on Parallelism", in Proc. 19th ISCA, pp. 46 -57, May 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. {19} S.A. Mahlke, D.C. Lin, W.Y. Chen, R.E. Hank and R.A. Bringmann, "Effective Compiler Support for Predicated Execution using the Hyperblock", in MICRO25, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. {20} S. Melvin and Y Patt, "Enhancing Instruction Scheduling with a Block-Structured ISA", in Intern. Journal of Parallel Prog., 23(3) pp 221-243, Jun. 1995 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. {21} M.C. Merten, A.R. Trick, C.N. George, J. Gyllenhaal, and W.W. Hwu, "A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization", in ISCA26, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. {22} M.C. Merten, A.R. Trick, E. M. Nystrom, R.D. Barnes and W. Mwu, "A Hardware Mechanism for Dynamic Extraction and Relayout of Program Hot Spots", in ISCA27, May 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. {23} R. Nair and M.E. Hopkins, "Exploiting Instruction Level Parallelism in Processors by Caching Scheduled Groups", in ISCA24, pp. 13-25, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. {24} A. Parikh, M. Kandemir, N. Vijaykrishnan and M.J. Irwin, "VLIW Scheduling for Energy and Performance" in Proc. IEEE Workshop on VLIW, pp. 111-117. April 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. {25} S. Patel and S. Lumetta, "rePlay: A Hardware Framework for Dynamic Optimization", in IEEE Trans. on Computers, 50(6), pp 590-608, June 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. {26} S. Patel, T. Tung, S Bose and M. Crum, "Increasing the Size of Atomic Instruction Blocks using Control Flow Assertions", in MICRO33, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. {27} A. Peleg and U. Weiser. "Dynamic Flow Instruction Cache Memory Organized Around Trace Segments Independent of Virtual Address Line", U. S. Patent 5,381,533, Jan. 1995.Google ScholarGoogle Scholar
  28. {28} M. Postiff, G. Tyson and T. Mudge, "Performance Limits of Trace Caches", in Journal of ILP, vol. 1, Oct. 1999.Google ScholarGoogle Scholar
  29. {29} R. Rosner, A. Mendelson and R. Ronen, "Filtering Techniques to Improve Trace-Cache Efficiency", in PACT'01, Sept. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. {30} R. Rosner, M. Moffie, Y. Sazeides and R. Ronen, "Selecting Long Atomic Traces for High Coverage", in ICS'03, pp. 2-11, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. {31} E. Rotenberg, S. Bennett and J. Smith, "A Trace Cache Microarchitecture and Evaluation", in IEEE Trans. on Computers, 48(2), pp 111-120, Feb. 1999 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. {32} B. Solomon, R. Ronen, D. Orenstien, Y. Almog and A. Mendelson "Micro-Operation Cache: A Power Aware Frontend for Variable Instruction Length ISA", in ISLPED'01, Aug. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. {33} B. Slechta et al., "Dynamic Optimizations of Micro-Operations", in HPCA9, Feb. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. {34} V. Srinivasan, D. Brooks, M. Gschwind, P. Bose, V. Zyuban, P.N. Strenski and P.G. Emma, "Optimizing Pipelines for Power and Performance", MICRO35, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 32, Issue 2
    ISCA 2004
    March 2004
    373 pages
    ISSN:0163-5964
    DOI:10.1145/1028176
    Issue’s Table of Contents
    • cover image ACM Conferences
      ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture
      June 2004
      373 pages
      ISBN:0769521436

    Copyright © 2004 Authors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 2 March 2004

    Check for updates

    Qualifiers

    • article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader