article

Power Awareness through Selective Dynamically Optimized Traces

Authors:
Roni Rosner

Intel Labs, Haifa, Israel

Intel Labs, Haifa, Israel
View Profile

,
Yoav Almog

Intel Labs, Haifa, Israel

Intel Labs, Haifa, Israel
View Profile

,
Micha Moffie

Intel Labs, Haifa, Israel

Intel Labs, Haifa, Israel
View Profile

,
Naftali Schwartz

Intel Labs, Haifa, Israel

Intel Labs, Haifa, Israel
View Profile

,
Avi Mendelson

Intel Labs, Haifa, Israel

Intel Labs, Haifa, Israel
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 32 Issue 2March 2004https://doi.org/10.1145/1028176.1006715

Published:02 March 2004Publication History

ACM SIGARCH Computer Architecture News

Abstract

We present the PARROT concept that seeks to achievehigher performance with reduced energy consumptionthrough gradual optimization of frequently executed codetraces. The PARROT microarchitectural framework integratestrace caching, dynamic optimizations and pipelinedecoupling. We employ a selective approach for applyingcomplex mechanisms only upon the most frequently usedtraces to maximize the performance gain at any givenpower constraint, thus attaining finer control of tradeoffsbetween performance and power awareness.We show that the PARROT based microarchitecture canimprove the performance of aggressively designed processorsby providing the means to improve the utilizationof their more elaborate resources. At the same time, rigorousselection of traces prior to storage and optimizationprovides the key to attenuating increases in thepower budget.For resource-constrained designs, PARROT based architecturesdeliver better performance (up to an average16% increase in IPC) at a comparable energy level,whereas the conventional path to a similar performanceimprovement consumes an average 70% more energy.Meanwhile, for those designs which can tolerate a higherpower budget, PARROT gracefully scales up to use additionalexecution resources in a uniformly efficient manner.In particular, a PARROT-style doubly-wide machinedelivers an average 45% IPC improvement while actuallyimproving the cubic-MIPS-per-WATT power awarenessmetric by over 50%.

References

{1} Y. Almog, R. Rosner, N. Schwartz and A. Schmorak, "Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture", in CGO'04, March 2004. Google ScholarDigital Library
{2} V. Bala, E. Duesterwald and S. Banerjia, "Transparent Dynamic Optimization: The Design and Implementation of Dynamo", TR HPL-1999-78, HP Labs.Google Scholar
{3} M. Bekerman, A. Mendelson and G Sheaffer, "Performance and Hardware Complexity Tradeoffs in Designing Multithreaded Architectures", in PACT, pp 24-34, Oct. 1996. Google ScholarDigital Library
{4} B. Black and J.P. Shen, "Turboscalar: A High Frequency High IPC Microarchitecture", in ISCA27, June 2000.Google Scholar
{5} D.M. Brooks et al, "Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors", IEEE Micro, 20(6):36-44, Nov./Dec. 2000. Google ScholarDigital Library
{6} D. Brooks, V. Tiwari and M. Martonosi, "Wattch: a Framework for Architectural-level Power Analysis and Optimizations", in ISCA27, 83-94, June 2000, Google ScholarDigital Library
{7} G. Cai, C.H. Lim and W.R. Daasch, "Thermal-Scheduling For Ultra Low Power Mobile Microprocessor", in Proc. WCED'02, 2002.Google Scholar
{8} A. Dhodapkar, C. Lim, G. Cai and R. Daasch, "TEM²P²EST: A Thermal Enabled Multi-Model Power/Performance ESTimator", in PACS Workshop, held in conjunction with ASPLOS, 2000. Google ScholarDigital Library
{9} K. Ebcioglu and E.R. Altman, "DAISY: Dynamic Compilation for 100% Architectural Compatibility", in ISCA24, pp. 26-37, 1997. Google ScholarDigital Library
{10} B. Fahs, S. Bose, M. Crum, B. Slechta, F. Spadini, T. Tung, S.J. Patel and S.S. Lumetta, "Performance Characterization of a Hard-ware Mechanism for Dynamic Optimization", MICRO34, Dec. 2001. Google ScholarDigital Library
{11} M Franklin and G.S. Sohi, "The Expandable Split Window Paradigm for Exploiting Fine-Grain Parallelism", in ISCA19, 1992. Google ScholarDigital Library
{12} D. Friendly, S. Patel and Y. Patt, "Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors", in MICRO31, Nov. 1998. Google ScholarDigital Library
{13} M. Gschwind, E.R. Altman, S. Sathaye, P. Ledak and D. Appenzeller, "Dynamic and Transparent Binary Translation", in IEEE Computer Magazine 33(3), pp. 54-59, 2000. Google ScholarDigital Library
{14} G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker and P. Roussel, "The Microarchitecture of the Pentium ® 4 Processor", in Intel Technology Journal, 2001.Google Scholar
{15} Q. Jacobson, E. Rotenberg and J.E. Smith, "Path-Based Next Trace Prediction", in MICRO30, 1997. Google ScholarDigital Library
{16} S. Jourdan, L. Rappoport, Y. Almog, M. Erez, A. Yoaz, and R. Ronen, "eXtended Block Cache", in HPCA6, Jan. 2000.Google Scholar
{17} O. Kosyakovsky, A. Mendelson and A. Kolodny, "The Use of Profile-based Trace Classification for Improving the Power and Performance of Trace Cache Systems", in 4th FDDO, Austin, Dec. 2001.Google Scholar
{18} M.S. Lam and R.P. Wilson, "Limits of Control Flow on Parallelism", in Proc. 19th ISCA, pp. 46 -57, May 1992. Google ScholarDigital Library
{19} S.A. Mahlke, D.C. Lin, W.Y. Chen, R.E. Hank and R.A. Bringmann, "Effective Compiler Support for Predicated Execution using the Hyperblock", in MICRO25, 1992. Google ScholarDigital Library
{20} S. Melvin and Y Patt, "Enhancing Instruction Scheduling with a Block-Structured ISA", in Intern. Journal of Parallel Prog., 23(3) pp 221-243, Jun. 1995 Google ScholarDigital Library
{21} M.C. Merten, A.R. Trick, C.N. George, J. Gyllenhaal, and W.W. Hwu, "A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization", in ISCA26, 1999. Google ScholarDigital Library
{22} M.C. Merten, A.R. Trick, E. M. Nystrom, R.D. Barnes and W. Mwu, "A Hardware Mechanism for Dynamic Extraction and Relayout of Program Hot Spots", in ISCA27, May 2000.Google ScholarDigital Library
{23} R. Nair and M.E. Hopkins, "Exploiting Instruction Level Parallelism in Processors by Caching Scheduled Groups", in ISCA24, pp. 13-25, 1997. Google ScholarDigital Library
{24} A. Parikh, M. Kandemir, N. Vijaykrishnan and M.J. Irwin, "VLIW Scheduling for Energy and Performance" in Proc. IEEE Workshop on VLIW, pp. 111-117. April 2001. Google ScholarDigital Library
{25} S. Patel and S. Lumetta, "rePlay: A Hardware Framework for Dynamic Optimization", in IEEE Trans. on Computers, 50(6), pp 590-608, June 2001 Google ScholarDigital Library
{26} S. Patel, T. Tung, S Bose and M. Crum, "Increasing the Size of Atomic Instruction Blocks using Control Flow Assertions", in MICRO33, 2000. Google ScholarDigital Library
{27} A. Peleg and U. Weiser. "Dynamic Flow Instruction Cache Memory Organized Around Trace Segments Independent of Virtual Address Line", U. S. Patent 5,381,533, Jan. 1995.Google Scholar
{28} M. Postiff, G. Tyson and T. Mudge, "Performance Limits of Trace Caches", in Journal of ILP, vol. 1, Oct. 1999.Google Scholar
{29} R. Rosner, A. Mendelson and R. Ronen, "Filtering Techniques to Improve Trace-Cache Efficiency", in PACT'01, Sept. 2001. Google ScholarDigital Library
{30} R. Rosner, M. Moffie, Y. Sazeides and R. Ronen, "Selecting Long Atomic Traces for High Coverage", in ICS'03, pp. 2-11, 2003. Google ScholarDigital Library
{31} E. Rotenberg, S. Bennett and J. Smith, "A Trace Cache Microarchitecture and Evaluation", in IEEE Trans. on Computers, 48(2), pp 111-120, Feb. 1999 Google ScholarDigital Library
{32} B. Solomon, R. Ronen, D. Orenstien, Y. Almog and A. Mendelson "Micro-Operation Cache: A Power Aware Frontend for Variable Instruction Length ISA", in ISLPED'01, Aug. 2001. Google ScholarDigital Library
{33} B. Slechta et al., "Dynamic Optimizations of Micro-Operations", in HPCA9, Feb. 2003. Google ScholarDigital Library
{34} V. Srinivasan, D. Brooks, M. Gschwind, P. Bose, V. Zyuban, P.N. Strenski and P.G. Emma, "Optimizing Pipelines for Power and Performance", MICRO35, 2002. Google ScholarDigital Library

Recommendations

Power Awareness through Selective Dynamically Optimized Traces
ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture

We present the PARROT concept that seeks to achievehigher performance with reduced energy consumptionthrough gradual optimization of frequently executed codetraces. The PARROT microarchitectural framework integratestrace caching, dynamic optimizations ...
Read More
Customizing VLIW processors from dynamically profiled execution traces

The design philosophy of VLIW processors is to maximize instruction level parallelism (ILP) starting from compiler and machine code level to all the way down to memory and computational blocks. For this purpose, VLIW tailoring has been an important ...
Read More
Dynamically Scheduling VLIW Instructions

Very long instruction word (VLIW) machines potentially provide the most direct way to exploit instruction-level parallelism; however, they cannot be used to emulate current general-purpose instruction set architectures. In addition, programs scheduled ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 32, Issue 2
ISCA 2004
March 2004
373 pages
ISSN:0163-5964
DOI:10.1145/1028176
Issue’s Table of Contents
ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture
June 2004
373 pages
ISBN:0769521436
Copyright © 2004 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 March 2004
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 713
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Power Awareness through Selective Dynamically Optimized Traces

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Recommendations

Power Awareness through Selective Dynamically Optimized Traces

Customizing VLIW processors from dynamically profiled execution traces

Dynamically Scheduling VLIW Instructions