Continuous profiling: where have all the cycles gone?

Authors:
Jennifer M. Anderson

Digital Equipment Corp., Palo Alto, CA

Digital Equipment Corp., Palo Alto, CA
View Profile

,
Lance M. Berc

Digital Equipment Corp., Palo Alto, CA

Digital Equipment Corp., Palo Alto, CA
View Profile

,
Jeffrey Dean

Digital Equipment Corp., Palo Alto, CA

Digital Equipment Corp., Palo Alto, CA
View Profile

,
Sanjay Ghemawat

Digital Equipment Corp., Palo Alto, CA

Digital Equipment Corp., Palo Alto, CA
View Profile

,
Monika R. Henzinger

Digital Equipment Corp., Palo Alto, CA

Digital Equipment Corp., Palo Alto, CA
View Profile

,
Shun-Tak A. Leung

Digital Equipment Corporation, Palo Alto, CA

Digital Equipment Corporation, Palo Alto, CA
View Profile

,
Richard L. Sites

Digital Equipment Corporation, Palo Alto, CA

Digital Equipment Corporation, Palo Alto, CA
View Profile

,
Mark T. Vandevoorde

Digital Equipment Corporation, Palo Alto, CA

Digital Equipment Corporation, Palo Alto, CA
View Profile

,
Carl A. Waldspurger

Digital Equipment Corporation, Palo Alto, CA

Digital Equipment Corporation, Palo Alto, CA
View Profile

,
William E. Weihl

Digital Equipment Corporation, Palo Alto, CA

Digital Equipment Corporation, Palo Alto, CA
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 15 Issue 4pp 357–390https://doi.org/10.1145/265924.265925

Published:01 November 1997Publication History

ACM Transactions on Computer Systems

Abstract

This article describes the Digital Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. The system supports multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel. Samples are collected at a high rate (over 5200 samples/sec. per 333MHz processor), yet with low overhead (1–3% slowdown for most workloads). Analysis tools supplied with the profiling system use the sample data to produce a precise and accurate accounting, down to the level of pipeline stalls incurred by individual instructions, of where time is bring spent. When instructions incur stalls, the tools identify possible reasons, such as cache misses, branch mispredictions, and functional unit contention. The fine-grained instruction-level analysis guides users and automated optimizers to the causes of performance problems and provides important insights for fixing them.

References

ANDERSON, T. E. AND LAZOWSKA, E. D. 1990. Quartz: A tool for tuning parallel program performance. In Proceedings of the ACM SIGMETRICS 1990 Conference on Measurement and Modeling of Computer Systems. ACM, New York, 115-125. Google Scholar
BALL, T. AND LARUS, g. 1994. Optimally profiling and tracing programs. ACM Trans. Program. Lang. Syst. 16, 4 (July), 1319-1360. Google ScholarDigital Library
BLICKSTEIN, D., CRAIG, P., DAVIDSON, C., FAIMAN, R., GLOSSOP, K., GROVE, R., HOBBS, S., AND NOYCE, W. 1992. The GEM optimizing compiler system. Digital Tech. J. 4, 4.Google Scholar
CARTA, D. 1990. Two fast implementations of the "minimal standard" random number generator. Commun. ACM 33, 1 (Jan.), 87-88. Google ScholarDigital Library
COHN, R. AND LOWNEY, P.G. 1996. Hot cold optimization of large Windows/NT applications. In 29th Annual International Symposium on Microarchitecture (Micro-29) (Paris, France, Dec.). Google Scholar
COHN, R., GOODWIN, D., LOWNEY, P. G., AND RUBIN, N. 1997. Spike: An optimizer for Alpha/NT executables. In USENIX Windows NT Workshop. USENIX Assoc., Berkeley, Calif. Google Scholar
DIGITAL. 1995a. Alpha 21164 microprocessor hardware reference manual. Digital Equipment Corp., Maynard, Mass.Google Scholar
DIGITAL. 1995b. DECchip 21064 and DECchip 21064A Alpha AXP microprocessors hardware reference manual. Digital Equipment Corp., Maynard, Mass.Google Scholar
GOLDBERG, A. J. AND HENNESSY, J.L. 1993. MTOOL: An integrated system for performance debugging shared memory multiprocessor applications. IEEE Trans. Parallel Distrib. Syst. 28-40. Google ScholarDigital Library
GRAHAM, S., KESSLER, P., AND McKuSICK, M. 1982. gprof: A call graph execution profiler. SIGPLAN Not. 17, 6 (June), 120-126. Google ScholarDigital Library
HALL, M., ANDERSON, J., AMARASINGHE, S., MURPHY, B., LIAO, S.-W., BUGNION, E., AND LAM, M. 1996. Maximizing multiprocessor performance with the SUIF compiler. IEEE Comput. 29, 12 (Dec.), 84-89. Google ScholarDigital Library
JOHNSON, R., PEARSON, D., AND PINGALI, K. 1994. The program structure tree: Computing control regions in linear time. In Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation. ACM, New York, 171-185. Google Scholar
MCCALPIN, J. D. 1995. Memory bandwidth and machine balance in high performance computers. IEEE Tech. Comm. Comput. Arch. Newslett. See also http://www.cs.virginia.edu/ stream.Google Scholar
MIPS. 1990. UMIPS-V reference manual (pixie and pixstats). MIPS Computer Systems, Sunnyvale, Calif.Google Scholar
REISER, J. F. AND SKUDLAREK, J. P. 1994. Program profiling problems, and a solution via machine language rewriting. SIGPLAN Not. 29, 1 (Jan.), 37-45. Google ScholarDigital Library
ROSENBLUM, M., HERROD, S., WITCHEL, E., AND GUPTA, A. 1995. Complete computer simulation: The SimOS approach. IEEE Parallel Distrib. Tech. 3, 3 (Fall). Google ScholarCross Ref
SITES, R. AND WITEK, R. 1995. Alpha AXP architecture reference manual. Digital Press, Newton, Mass. Google Scholar
ZAGHA, M., LARSON, B., TURNER, S., AND ITZKOWITZ, M. 1996. Performance analysis using the MIPS R10000 performance counters. In Proceedings of Supercomputing. Google Scholar
ZHANG, X., WANG, Z., GLOY, N., CHEN, J. B., AND SMITH, M. D. 1997. Operating system support for automated profiling and optimization. In Proceedings of the 16th ACM Symposium on Operating Systems Principles. ACM, New York. Google Scholar

Index Terms

Continuous profiling: where have all the cycles gone?
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
2. Software and its engineering

Recommendations

Hardware-Based Profiling: An Effective Technique for Profile-Driven Optimization

Profile-based optimization can be used for instruction scheduling, loop scheduling, data preloading, function in-lining, and instruction cache performance enhancement. However, these techniques have not been embraced by software vendors because programs ...
Read More
Evaluating the use of profiling by a region-based register allocator
SAC '02: Proceedings of the 2002 ACM symposium on Applied computing

In a region-based compilation framework, the compiler builds regions to provide the best compilation unit for scheduling and optimization. The compiler uses execution frequency information gained from profiling to place frequently executed blocks in the ...
Read More
Value profiling
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture

Identifying variables as invariant or constant at compile-time allows the compiler to perform optimizations including constant folding, code specialization, and partial evaluation. Some variables, which cannot be labeled as constants, may exhibit semi-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computer Systems Volume 15, Issue 4
Nov. 1997
92 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/265924
Editor:
Kenneth P. Birman
Cornell Univ., Ithaca, NY
Issue’s Table of Contents
Copyright © 1997 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 1997
Published in tocs Volume 15, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
performance understanding
performance-monitoring hardware
profiling
program analysis
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 260
  Total Citations
  View Citations
- 2,327
  Total Downloads
- Downloads (Last 12 months)247
- Downloads (Last 6 weeks)32
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Continuous profiling: where have all the cycles gone?

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Hardware-Based Profiling: An Effective Technique for Profile-Driven Optimization

Evaluating the use of profiling by a region-based register allocator

Value profiling