ABSTRACT
Miniaturization of devices and the ensuing decrease in the threshold voltage has led to a substantial increase in the leakage component of the total processor energy consumption. Relatively simpler issue logic and the presence of a large number of function units in the VLIW and the clustered VLIW architectures attribute a large fraction of this leakage energy consumption in the functional units. However, functional units are not fully utilized in the VLIW architectures because of the inherent variations in the ILP of the programs. This underutilization is even more pronounced in the context of clustered VLIW architectures because of the contentions for the limited number of slow intercluster communication channels which lead to many short idle cycles.In the past, some architectural schemes have been proposed to obtain leakage energy bene .ts by aggressively exploiting the idleness of functional units. However, presence of many short idle cycles cause frequent transitions from the active mode to the sleep mode and vice-versa and adversely a ffects the energy benefits of a purely hardware based scheme. In this paper, we propose and evaluate a compiler instruction scheduling algorithm that assist such a hardware based scheme in the context of VLIW and clustered VLIW architectures. The proposed scheme exploits the scheduling slacks of instructions to orchestrate the functional unit mapping with the objective of reducing the number of transitions in functional units thereby keeping them off for a longer duration. The proposed compiler-assisted scheme obtains a further 12% reduction of energy consumption of functional units with negligible performance degradation over a hardware-only scheme for a VLIW architecture. The benefits are 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively. Our test bed uses the Trimaran compiler infrastructure.
- MediaBench.http://cares.icsl.ucla.edu/MediaBench/.]]Google Scholar
- MiBench. http://www.eecs.umich.edu/mibench/.]]Google Scholar
- NetBench. http://cares.icsl.ucla.edu/NetBench/.]]Google Scholar
- Trimaran System. http://www.trimaran.org/.]]Google Scholar
- S. G. Abraham, W. M. Meleis, and I. D. Baev. Efficient Backtracking Instruction Schedulers. In Proc. of Intl. Conf. on Parallel Architectures and Compilation Techniques pages 301--308, 2000.]] Google ScholarDigital Library
- A. Aleta, J. M. Codina, J. Sanchez, and A. Gonzalez. Graph-partitioning based Instruction Scheduling for Clustered Processors. In Proc. of Intl. Symp. on Microarchitecture pages 150--159, 2001.]] Google ScholarDigital Library
- S. Borkar. Design Challenges of Technology Scaling. IEEE Micro 19(4): 23--29,1999.]] Google ScholarDigital Library
- J. A. Buttsand G. S. Sohi. A Static Power Model for Architects. In Proc. of the Intl. Symp. on Microarchitecture pages 191--201, New York, NY, USA, 2000.]] Google ScholarDigital Library
- M. Chu, K. Fan, and S. Mahlke. Region-based Hierarchical Operation Partitioning for Multicluster Processors. SIGPLAN Notices pages 300--311, 2003.]] Google ScholarDigital Library
- G. Desoli. Instruction Assignment for Clustered VLIW DSP Compilers: A New Approach. Technical Report, Hewlett-Packard, 1998.]]Google Scholar
- S. Dropsho, V. Kursun, D. H. Albonesi, S. Dwarkadas, and E. G. Friedman. Managing Static Leakage Energy in Microprocessor Functional Units. In Proc. of the Intl. Symp. on Microarchitecture pages 321--332, Los Alamitos, CA, USA, 2002.]] Google ScholarDigital Library
- J. R. Ellis. Bulldog: A Compiler for VLIW Architectures MIT Press, 1986.]] Google ScholarDigital Library
- PFaraboschi, G. Brown, J. A. Fisher, and G. Desoli. Clustered Instruction-level Parallel Processors. Technical report, Hewlett-Packard, 1998.]]Google Scholar
- K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy Caches: Simple Techniques for Reducing Leakage Power. In Proc. of the Intl. Symp. on Computer Architecture pages 148--157, Washington, DC, USA, 2002.]] Google ScholarDigital Library
- B. M.-S. Gokhan Memic and W. Hu. NetBench: A Benchmarking Suit for Network Processor. CARES Technical Report 2002.]]Google Scholar
- M. Guthaus, J. Ringenberg, and D. Ernst. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. IEEE 4th Annual Workshop on Workload Characterization 2001.]] Google ScholarDigital Library
- K. Kailas, A. Agrawala, and K. Ebcioglu. CARS: A New Code Generation Framework for Clustered ILP Processors. In Proc. of Intl. Symp. on High-Performance Computer Architecture page 133, 2001.]] Google ScholarDigital Library
- S. Kaxiras, Z. Hu, and M. Martonosi. Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power. In Proc. of the Intl. Symp. on Computer Architecture pages 240--251,New York, NY, USA, 2001.]] Google ScholarDigital Library
- H. S. Kim, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin. Adapting Instruction Level Parallelism for Optimizing Leakage in VLIW Architectures. In Proc. of Conf. on Language, Compiler, and Tool for Embedded Systems pages 275--283,2003.]] Google ScholarDigital Library
- V. Kursun and E. G. Friedman. Low swing Dual Threshold Voltage Domino Logic. In Proc. of the ACM Great Lakes Symp. on VLSI pages 47--52, New York, NY, USA, 2002.]] Google ScholarDigital Library
- V. S. Lapinskii, M. F. Jacome, and G. A. De Veciana. Cluster Assignment for High-Performance Embedded VLIW Processors. ACM Trans. on Design and Automation of Electronic Systems pages 430--454, 2002.]] Google ScholarDigital Library
- C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems. In Proc. of Intl. Symp. on Microarchitecture 1997.]] Google ScholarDigital Library
- W. Lee, D. Puppin, S. Swenson, and S. Amarasinghe. Convergent Scheduling.In Proc. of Intl. Symp. on Microarchitecture pages 111--122, 2002.]] Google ScholarDigital Library
- R. Leupers. Instruction Scheduling for Clustered VLIW DSPs. In Proc. of Intl. Conf. on Parallel Architectures and Compilation Techniques page 291, Washington, DC, USA, 2000.]] Google ScholarDigital Library
- T. N. Mudge. Power: A First Class Design Constraint for Future Architecture and Automation.In Proc. of the Intl. Conf. on High Performance Computing pages 215--224, London, UK, 2000. Springer-Verlag.]] Google ScholarDigital Library
- R. Nagpal and Y. N. Srikant. A Graph Matching Based Integrated Scheduling Framework for Clustered VLIW Processors.In Proc. of ICPP Workshop on Compile and Runtime Techniques Parallel Computing pages 530--537, 2004.]] Google ScholarDigital Library
- R. Nagpal and Y. N. Srikant. Integrated Temporal and Spatial Scheduling for Extended Operand Clustered VLIW Processors. In Proc. of Conf. on computing frontiers pages 457--470, 2004.]] Google ScholarDigital Library
- R. Nagpal and Y. N. Srikant. Compiler-Assisted Leakage Energy Optimization for Clustered VLIW Architectures. Technical Report, Dept. of CSA, Indian Institute of Science(http://www.archive.csa.iisc.ernet.in/TR), 2005.]]Google Scholar
- E. Nystrom and A. E. Eichenberger. Effective Cluster Assignment for Modulo Scheduling. In Proc. of 31st annual ACM/IEEE Intl. Symp. on Microarchitecture pages 103--114, 1998.]] Google ScholarDigital Library
- E. Ozer, S. Banerjia, and T. M. Conte. Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures. In Proc. of Intl. Symp. on Microarchitecture pages 308--315, 1998.]] Google ScholarDigital Library
- S. Rele, S. Pande, S. Onder, and R. Gupta. Optimizing Static Power Dissipation by Functional Units in Superscalar Processors. In Proc. of 11th Intl. Conf. on Compiler Construction pages 261--275, 2002.]] Google ScholarDigital Library
- D. Sylvester and H. Kaul. Power-Driven Challenges in Nanometer Design.IEEE Design and Test of Computers 18(6): 12--22, 2001.]] Google ScholarDigital Library
- K. A. Vardhan and Y. N. Srikant. Transition Aware Scheduling: Increasing Continuous Idle-Periods in Resource Units. In Proc. of the Conf. on Computing frontiers pages 189--198, New York, NY, USA, 2005.]] Google ScholarDigital Library
- S.-H. Yang, B. Falsa., M. D. Powell, K. Roy, and T. N. Vijaykumar. An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I Caches. In Proc. of the Intl. Symp. on High-Performance Computer Architecture page 147, Washington, DC, USA, 2001.]] Google ScholarDigital Library
- H. Yun and J. Kim. Power-aware Modulo Scheduling for High-Performance VLIW Processors. In Proc. of Intl. Symp. on Low Power Electronics and Design pages 40--45,2001.]] Google ScholarDigital Library
- J. Zalamea, J. Llosa, E. Ayguade, and M. Valero. Modulo Scheduling with Integrated Register Spilling for Clustered VLIW Architectures. In Proc. of Intl. Symp. on Microarchitecture pages 160--169, 2001.]] Google ScholarDigital Library
- W. Zhang, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, D. Duarte, and Y.-F. Tsai. Exploiting VLIW Schedule Slacks for Dynamic and Leakage Energy Reduction. In Proc. of Intl. Symp. on Microarchitecture pages 102--113,2001.]] Google ScholarDigital Library
Index Terms
- Compiler-assisted leakage energy optimization for clustered VLIW architectures
Recommendations
Compiler-assisted power optimization for clustered VLIW architectures
Clustered VLIW architectures solve the scalability problem associated with flat VLIW architectures by partitioning the register file and connecting only a subset of the functional units to a register file. However, inter-cluster communication in ...
Compiler-assisted energy optimization for clustered VLIW processors
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving the clock speed, reducing ...
Compiler-Assisted Instruction Decoder Energy Optimization for Clustered VLIW Architectures
High Performance Computing – HiPC 2007AbstractTraditionally, an instruction decoder is designed as a monolithic structure that inhibit the leakage energy optimization. In this paper, we consider a split instruction decoder that enable the leakage energy optimization. We also propose a ...
Comments