Article

A fast on-chip profiler memory

Authors:
Roman Lysecky

University of California, Riverside

University of California, Riverside
View Profile

,
Susan Cotterell

University of California, Riverside

University of California, Riverside
View Profile

,
Frank Vahid

University of California, Riverside

University of California, Riverside
View Profile

DAC '02: Proceedings of the 39th annual Design Automation ConferenceJune 2002Pages 28–33https://doi.org/10.1145/513918.513928

Published:10 June 2002Publication History

DAC '02: Proceedings of the 39th annual Design Automation Conference

Pages 28–33

ABSTRACT

Profiling an application executing on a microprocessor is part of the solution to numerous software and hardware optimization and design automation problems. Most current profiling techniques suffer from runtime overhead, inaccuracy, or slowness, and the traditional non-intrusive method of using a logic analyzer doesn't work for today's system-on-a-chip having embedded cores. We introduce a novel on-chip memory architecture that overcomes these limitations. The architecture, which we call ProMem, is based on a pipelined binary tree structure. It achieves single-cycle throughput, so it can keep up with today's fastest pipelined processors. It can also be laid out efficiently and scales very well, becoming more efficient the larger it gets. The memory can be used in a wide-variety of common profiling situations, such as instruction profiling, value profiling, and network traffic profiling, which in turn can be used to guide numerous design automation tasks.

References

Anderson, J., et al. Continuous Profiling: Where Have All the Cycles Gone? 16th ACM Symp. of Operating Systems Design, 1997. Google ScholarDigital Library
Artisan Components, Inc. UMC .18 Technology Library, http://www.artisan.com, 2001.Google Scholar
Bala, V., E. Duesterwald, and S. Banerjia. Dynamo: A Transparent Dynamic Optimization System. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2000. Google ScholarDigital Library
Bellas, N., et al. Energy and Performance Improvements in Microprocessor Design Using a Loop Cache. ICCD, pp. 378--383, 1999. Google ScholarDigital Library
Burger, D. and T. M. Austin. The SimpleScalar tool set, version 2.0. Tech. Rep. CS-1342, University of Wisconsin-Madison, June 1997.Google ScholarDigital Library
Calder, B., P. Feller and A. Eustace. Value Profiling. MICRO, pp. 259--269, 1997. Google ScholarDigital Library
Chung, E.Y., L. Benini and G. De Micheli. Automatic Source Code Specialization for Energy Reduction. ISLPED, 2001. Google ScholarDigital Library
Dean, J., et al. ProfileMe: Hardware Support for Instruction-Level Profiling on Out-of-Order Processors. MICRO, 1997. Google ScholarDigital Library
Gordon-Ross, A., S. Cotterell and F. Vahid. Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example. IEEE Computer Architecture Letters, Jan. 2002. Google ScholarDigital Library
Graham, S.L., P.B. Kessler and M.K. McKusick. gprof: a Call Graph Execution Profiler. SIGPLAN Symp. on Compiler Construction, pp. 120--126, 1982. Google ScholarDigital Library
IEEE, IEEE 1149.1 Standard Test Access Port and Boundary-Scan Architecture, http://standards.ieee.org, 2001.Google Scholar
Ishihara, T., H. Yasuura. A Power Reduction Technique with Object Code Merging for Application Specific Embedded Processors. DATE, March 2000. Google ScholarDigital Library
Klaiber, A. The Technology Behind Crusoe Processors. Transmeta Corporation, http://www.transmeta.com, 2000.Google Scholar
Lakshminarayana, G., et al. Common-Case Computation: A High-Level Technique for Power and Performance Optimization. DAC, pp. 1--5, 1999. Google ScholarDigital Library
Pettis, K. and R.C. Hansen. Profile Guided Code Positioning. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 1990. Google ScholarDigital Library
Semiconductor Industry Association. International Technology Roadmap for Semiconductors: 1999 edition. Austin, TX: International SEMATECH, 1999.Google Scholar
Synopsys, Inc. Design Compiler, http://www.synopsys.com, 2001.Google Scholar
Vahid, F., T. Givargis. Platform Tuning for Embedded Systems Design. IEEE Computer, Vol 34, No. 3, pp. 112--114, March 2001. Google ScholarDigital Library
Vtune Environment, Intel Corp., http://developer.intel.com/vtune.Google Scholar
Waldvogel, M., et al. Scalable High Speed IP Routing Lookups, SIGCOMM 97, 1997. Google ScholarDigital Library
Zagha, M., B. Larson, S. Turner, and M. Itzkowitz. Performance Analysis Using the MIPS R10000 Performance Counters. Supercomputing, Nov. 1996. Google ScholarDigital Library
Zhang, X., et al. System Support for automatic Profiling and Optimization. Proceedings of the 16th Symp. on Operating Systems Principles, 1997. Google ScholarDigital Library
Zilles, C.B. and G.S. Sohi. A Programmable Co-processor for Profiling. International Symp. on High-Performance Computer Architectures, 2001 Google ScholarDigital Library

Index Terms

A fast on-chip profiler memory
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Hardware support for real-time embedded multiprocessor system-on-a-chip memory management
CODES '02: Proceedings of the tenth international symposium on Hardware/software codesign

The aggressive evolution of the semiconductor industry --- smaller process geometries, higher densities, and greater chip complexity --- has provided design engineers the means to create complex high-performance Systems-on-a-Chip (SoC) designs. Such SoC ...
Read More
A fast on-chip profiler memory using a pipelined binary tree

We introduce a novel memory architecture that can count the occurrences of patterns on a system's bus, a task known as profiling. Such profiling can serve a variety of purposes, like detecting a microprocessor's software hot spots or frequently used ...
Read More
System-level exploration for pareto-optimal configurations in parameterized systems-on-a-chip
ICCAD '01: Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design

In this work, we provide a technique for efficiently exploring the configuration space of a parameterized system-on-a-chip (SOC) architecture to find all Pareto-optimal configurations. These configurations represent the range of meaningful power and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DAC '02: Proceedings of the 39th annual Design Automation Conference
June 2002
956 pages
ISBN:1581134614
DOI:10.1145/513918
General Chair:
Bryan Ackland
Agere Systems, Inc., Holmdel, NJ
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 June 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
adaptive architectures
binary tree
embedded CAD
embedded systems
low power
memory design
platform tuning
profiling
system-on-a-chip
Qualifiers
- Article
Conference

Acceptance Rates
DAC '02 Paper Acceptance Rate147of491submissions,30%Overall Acceptance Rate1,770of5,499submissions,32%
More
Upcoming Conference
DAC '24

Sponsor:

sigda

61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

San Francisco , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 18
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A fast on-chip profiler memory

DAC '02: Proceedings of the 39th annual Design Automation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hardware support for real-time embedded multiprocessor system-on-a-chip memory management

A fast on-chip profiler memory using a pipelined binary tree

System-level exploration for pareto-optimal configurations in parameterized systems-on-a-chip

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A fast on-chip profiler memory

DAC '02: Proceedings of the 39th annual Design Automation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hardware support for real-time embedded multiprocessor system-on-a-chip memory management

A fast on-chip profiler memory using a pipelined binary tree

System-level exploration for pareto-optimal configurations in parameterized systems-on-a-chip

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media