ABSTRACT
Embedded system programs tend to spend much time in small loops. Introducing a very small loop cache into the instruction memory hierarchy has thus been shown to substantially reduce instruction fetch energy. However, loop caches come in many sizes and variations -- using the configuration best on the average may actually result in worsened energy for a specific program. We therefore introduce a loop cache exploration tool that analyzes a particular program's profile, rapidly explores the possible configurations, and generates the configuration with the greatest power savings. We introduce a simulation-based approach and show the good energy savings that a customized loop cache yields. We also introduce a fast estimation-based approach that obtains nearly the same results in seconds rather than tens of minutes or hours.
- Aditya, S., B. Rau, V. Kathail. Automatic architectural synthesis of VLIW and EPIC Processors. Int. Symp. on System Synthesis, 1999. Google ScholarDigital Library
- Bahar, R., G. Albera, S. Manne. Power and Performance Tradeoffs using Various Caching Strategies. Int. Symp.on Low Power Electronics and Design, 1998. Google ScholarDigital Library
- Benini, L., A. Macii, E. Macii, M. Poncino. Selective Instruction Compression for Memory Energy Reduction in Embedded Systems. Int. Symp. on Low Power Electronics and Design, 1999. Google ScholarDigital Library
- Benini, L., G. Micheli, E. Macii, D. Sciuto, C. Silvano. Asymptotic Zero-Transition Activity Encoding for Address Busses in Low-Power Microprocessor-Based Systems. IEEE GLS-VLSI-97, 1997. Google ScholarDigital Library
- Elder, J., M.D. Hill. Dinero IV Trace-Driven Uniprocessor Cache Simulator. http://www.cs.wisc.edu/~markhill/DineroIV.Google Scholar
- Fisher, J. Customized Instruction-Sets For Embedded Processors. Design Automation Conference, 1999. Google ScholarDigital Library
- Fisher, J., P. Faraboschi, G. Desoli. Custom-Fit Processors: Letting Applications Define Architectures. Int. Symp. on Microarchitecture, 1996. Google ScholarDigital Library
- Gonzales, R. Xtensa: A Configurable and Extensible Processor. Int. Symp. on Microarchitecture, 2000.Google Scholar
- Gordon-Ross, A., S. Cotterell, F. Vahid. Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example. Computer Architecture Letters, Vol 1, 2002. Google ScholarDigital Library
- Kalambur, A., M. J. Irwin. An Extended Addressing Mode for Low Power. Int. Symp. on Low Power Electronics and Design, 1997. Google ScholarDigital Library
- Kavvadias, N., A. Chatzigeorgiou, N. Zervas, S. Nikolaidis. Memory Hierarchy Exploration For Low Power Architectures in Embedded Multimedia Applications. Int. Conf. on Image Processing, 2001.Google Scholar
- Kienhuis, B., E. Deprettere, K. Vissers, P. van der Wolf. An Approach for Quantitative Analysis of Application-Specific Dataflow Architectures. Application-Specific Systems, Architectures, and Processors, 1997. Google ScholarDigital Library
- Kim, S., N. Vijaykrishnan, M. Kandemir, A. Sivasubramaniam, M. Irwin, E. Geethanjali. Power-aware Paritioned Cache Architectures. Int. Symp. on Low Power Electronics and Design, 2001. Google ScholarDigital Library
- Kin, J., M. Gupta, W. Magione-Smith. The Filter Cache: An Energy Efficient Memory Structure. Int. Symp. on Microarchitecture, 1997. Google ScholarDigital Library
- Kirovski, D., J. Kin, W. Mangione-Smith. Procedure Based Program Compression. Int. Symp. on Microachitecture, 1997. Google ScholarDigital Library
- Ko, U., P. Balsara. Characterization and Design of A Low-Power, High-Performance Cache Architecture. Int. Symp. on VLSI Technology, Systems, and Applications, 1995.Google Scholar
- Lee, C., M. Potkonjak, W. Magione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communication Systems. International Symposium on Microarchitecture, 1997. Google ScholarDigital Library
- Lee, L., B. Moyer, J. Arends. Instruction Fetch Energy Reduction Using Loop Caches For Embedded Applications with Small Tight Loops. Int. Symp. on Low Power Electronics and Design, 1999. Google ScholarDigital Library
- Lee, L., B. Moyer, J. Arends. Low-Cost Embedded Program Loop Caching -- Revisited. University of Michigan Technical Report CSE-TR-411-99, 1999.Google Scholar
- Lekatsas, H., J. Henkel, W. Wolf. Code Compression for Low Power Embedded System Design. Design Automation Conference, 2000. Google ScholarDigital Library
- Malik, A., B. Moyer, D. Cermak. A Low Power Unified Cache Architecture Providing Power and Performance Flexibility. Int. Symp. on Low Power Electronics and Design. 2000. Google ScholarDigital Library
- Mehta, H., R. Owens, M. Irwin. Some Issues in Gray Code Addressing. IEEE GLS-VLSI-96, March 1996. Google ScholarDigital Library
- Montanaro, J., et. al. A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor. IEEE Journal of Solid State Circuits, 1996.Google Scholar
- Nachtergaele, L., F. Catthoor, F. Balasa, F. Franssen, E. DeGreef, H. Samsom, and H. De Man., Optimization of Memory Organization and Hierarchy for Decreased Size and Power in Video and Image Processing Systems. Int. Workshop on Memory Technology, 1995. Google ScholarDigital Library
- Panda, P., N. Dutt, A. Nicolau. Architectural Exploration and Optimization of Local Memory in Embedded Systems. Int. Symp. on System Synthesis, 1997. Google ScholarDigital Library
- Shiue, W., C. Chakrabarti. Memory Design and Exploration for Low Power, Embedded Systems. Journal of VLSI Signal Processing -- Systems for Signal, Image, and Video Technology, Vol. 29, No. 3, pp. 167--178, 2001. Google ScholarDigital Library
- Stan, M., W. Burleson. Bus Invert for Low Power I/O. IEEE Transactions on VLSI, 1995. Google ScholarDigital Library
- Su, C., C. Tsui, A. Despain. Cache Design Trade-offs for Power and Performance Optimization: A Case Study. Int. Symp. Low Power Design, 1995. Google ScholarDigital Library
- Su, C., C. Tsui, A. Despain. Saving Power in the Control Path of Embedded Processors. IEEE Test and Design of Computers, Vol. 11, No. 4, 1994. Google ScholarDigital Library
- Sugumar, R., and S. Abraham. Efficient Simulation of Multiple Cache Configurations using Binomial Trees. Technical Report CSE-TR-111-91, CSE Division, University of Michigan, 1991.Google Scholar
- Vahid, F., T. Givargis, Platform Tuning for Embedded Systems Design. IEEE Computer, Vol. 34, No 3, 2001. Google ScholarDigital Library
- Villarreal, J., D. Suresh, G. Stitt, F. Vahid, and W. Najjar. Improving Software Performance with Configurable Logic. Design Automation of Embedded System, 2002.Google Scholar
- Villarreal, J., R. Lysecky, S. Cotterell, and F. Vahid. A Study on the Loop Behavior of Embedded Programs. Technical Report UCR-CSE-01-03, University of California, Riverside, 2002.Google Scholar
- Wu, Z, and W. Wolf. Iterative Cache Simulation of Embedded CPUs with Trace Stripping. International Conference on Hardware/Software Co-Design, 1999. Google ScholarDigital Library
Index Terms
- Synthesis of customized loop caches for core-based embedded systems
Recommendations
Tuning of loop cache architectures to programs in embedded system design
ISSS '02: Proceedings of the 15th international symposium on System SynthesisAdding a small loop cache to a microprocessor has been shown to reduce average instruction fetch energy for various sets of embedded system applications. With the advent of core-based design, embedded system designers can now tune a loop cache ...
Tiny instruction caches for low power embedded systems
Instruction caches have traditionally been used to improve software performance. Recently, several tiny instruction cache designs, including filter caches and dynamic loop caches, have been proposed to instead reduce software power. We propose several ...
A highly configurable cache for low energy embedded systems
Energy consumption is a major concern in many embedded computing systems. Several studies have shown that cache memories account for about 50% of the total energy consumed in these systems. The performance of a given cache architecture is determined, to ...
Comments