ABSTRACT
Profiling an application executing on a microprocessor is part of the solution to numerous software and hardware optimization and design automation problems. Most current profiling techniques suffer from runtime overhead, inaccuracy, or slowness, and the traditional non-intrusive method of using a logic analyzer doesn't work for today's system-on-a-chip having embedded cores. We introduce a novel on-chip memory architecture that overcomes these limitations. The architecture, which we call ProMem, is based on a pipelined binary tree structure. It achieves single-cycle throughput, so it can keep up with today's fastest pipelined processors. It can also be laid out efficiently and scales very well, becoming more efficient the larger it gets. The memory can be used in a wide-variety of common profiling situations, such as instruction profiling, value profiling, and network traffic profiling, which in turn can be used to guide numerous design automation tasks.
- Anderson, J., et al. Continuous Profiling: Where Have All the Cycles Gone? 16th ACM Symp. of Operating Systems Design, 1997. Google ScholarDigital Library
- Artisan Components, Inc. UMC .18 Technology Library, http://www.artisan.com, 2001.Google Scholar
- Bala, V., E. Duesterwald, and S. Banerjia. Dynamo: A Transparent Dynamic Optimization System. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2000. Google ScholarDigital Library
- Bellas, N., et al. Energy and Performance Improvements in Microprocessor Design Using a Loop Cache. ICCD, pp. 378--383, 1999. Google ScholarDigital Library
- Burger, D. and T. M. Austin. The SimpleScalar tool set, version 2.0. Tech. Rep. CS-1342, University of Wisconsin-Madison, June 1997.Google ScholarDigital Library
- Calder, B., P. Feller and A. Eustace. Value Profiling. MICRO, pp. 259--269, 1997. Google ScholarDigital Library
- Chung, E.Y., L. Benini and G. De Micheli. Automatic Source Code Specialization for Energy Reduction. ISLPED, 2001. Google ScholarDigital Library
- Dean, J., et al. ProfileMe: Hardware Support for Instruction-Level Profiling on Out-of-Order Processors. MICRO, 1997. Google ScholarDigital Library
- Gordon-Ross, A., S. Cotterell and F. Vahid. Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example. IEEE Computer Architecture Letters, Jan. 2002. Google ScholarDigital Library
- Graham, S.L., P.B. Kessler and M.K. McKusick. gprof: a Call Graph Execution Profiler. SIGPLAN Symp. on Compiler Construction, pp. 120--126, 1982. Google ScholarDigital Library
- IEEE, IEEE 1149.1 Standard Test Access Port and Boundary-Scan Architecture, http://standards.ieee.org, 2001.Google Scholar
- Ishihara, T., H. Yasuura. A Power Reduction Technique with Object Code Merging for Application Specific Embedded Processors. DATE, March 2000. Google ScholarDigital Library
- Klaiber, A. The Technology Behind Crusoe Processors. Transmeta Corporation, http://www.transmeta.com, 2000.Google Scholar
- Lakshminarayana, G., et al. Common-Case Computation: A High-Level Technique for Power and Performance Optimization. DAC, pp. 1--5, 1999. Google ScholarDigital Library
- Pettis, K. and R.C. Hansen. Profile Guided Code Positioning. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 1990. Google ScholarDigital Library
- Semiconductor Industry Association. International Technology Roadmap for Semiconductors: 1999 edition. Austin, TX: International SEMATECH, 1999.Google Scholar
- Synopsys, Inc. Design Compiler, http://www.synopsys.com, 2001.Google Scholar
- Vahid, F., T. Givargis. Platform Tuning for Embedded Systems Design. IEEE Computer, Vol 34, No. 3, pp. 112--114, March 2001. Google ScholarDigital Library
- Vtune Environment, Intel Corp., http://developer.intel.com/vtune.Google Scholar
- Waldvogel, M., et al. Scalable High Speed IP Routing Lookups, SIGCOMM 97, 1997. Google ScholarDigital Library
- Zagha, M., B. Larson, S. Turner, and M. Itzkowitz. Performance Analysis Using the MIPS R10000 Performance Counters. Supercomputing, Nov. 1996. Google ScholarDigital Library
- Zhang, X., et al. System Support for automatic Profiling and Optimization. Proceedings of the 16th Symp. on Operating Systems Principles, 1997. Google ScholarDigital Library
- Zilles, C.B. and G.S. Sohi. A Programmable Co-processor for Profiling. International Symp. on High-Performance Computer Architectures, 2001 Google ScholarDigital Library
Index Terms
- A fast on-chip profiler memory
Recommendations
Hardware support for real-time embedded multiprocessor system-on-a-chip memory management
CODES '02: Proceedings of the tenth international symposium on Hardware/software codesignThe aggressive evolution of the semiconductor industry --- smaller process geometries, higher densities, and greater chip complexity --- has provided design engineers the means to create complex high-performance Systems-on-a-Chip (SoC) designs. Such SoC ...
A fast on-chip profiler memory using a pipelined binary tree
We introduce a novel memory architecture that can count the occurrences of patterns on a system's bus, a task known as profiling. Such profiling can serve a variety of purposes, like detecting a microprocessor's software hot spots or frequently used ...
System-level exploration for pareto-optimal configurations in parameterized systems-on-a-chip
ICCAD '01: Proceedings of the 2001 IEEE/ACM international conference on Computer-aided designIn this work, we provide a technique for efficiently exploring the configuration space of a parameterized system-on-a-chip (SOC) architecture to find all Pareto-optimal configurations. These configurations represent the range of meaningful power and ...
Comments