skip to main content
article
Free Access

Continuous profiling: where have all the cycles gone?

Published:01 November 1997Publication History
Skip Abstract Section

Abstract

This article describes the Digital Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. The system supports multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel. Samples are collected at a high rate (over 5200 samples/sec. per 333MHz processor), yet with low overhead (1–3% slowdown for most workloads). Analysis tools supplied with the profiling system use the sample data to produce a precise and accurate accounting, down to the level of pipeline stalls incurred by individual instructions, of where time is bring spent. When instructions incur stalls, the tools identify possible reasons, such as cache misses, branch mispredictions, and functional unit contention. The fine-grained instruction-level analysis guides users and automated optimizers to the causes of performance problems and provides important insights for fixing them.

References

  1. ANDERSON, T. E. AND LAZOWSKA, E. D. 1990. Quartz: A tool for tuning parallel program performance. In Proceedings of the ACM SIGMETRICS 1990 Conference on Measurement and Modeling of Computer Systems. ACM, New York, 115-125. Google ScholarGoogle Scholar
  2. BALL, T. AND LARUS, g. 1994. Optimally profiling and tracing programs. ACM Trans. Program. Lang. Syst. 16, 4 (July), 1319-1360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. BLICKSTEIN, D., CRAIG, P., DAVIDSON, C., FAIMAN, R., GLOSSOP, K., GROVE, R., HOBBS, S., AND NOYCE, W. 1992. The GEM optimizing compiler system. Digital Tech. J. 4, 4.Google ScholarGoogle Scholar
  4. CARTA, D. 1990. Two fast implementations of the "minimal standard" random number generator. Commun. ACM 33, 1 (Jan.), 87-88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. COHN, R. AND LOWNEY, P.G. 1996. Hot cold optimization of large Windows/NT applications. In 29th Annual International Symposium on Microarchitecture (Micro-29) (Paris, France, Dec.). Google ScholarGoogle Scholar
  6. COHN, R., GOODWIN, D., LOWNEY, P. G., AND RUBIN, N. 1997. Spike: An optimizer for Alpha/NT executables. In USENIX Windows NT Workshop. USENIX Assoc., Berkeley, Calif. Google ScholarGoogle Scholar
  7. DIGITAL. 1995a. Alpha 21164 microprocessor hardware reference manual. Digital Equipment Corp., Maynard, Mass.Google ScholarGoogle Scholar
  8. DIGITAL. 1995b. DECchip 21064 and DECchip 21064A Alpha AXP microprocessors hardware reference manual. Digital Equipment Corp., Maynard, Mass.Google ScholarGoogle Scholar
  9. GOLDBERG, A. J. AND HENNESSY, J.L. 1993. MTOOL: An integrated system for performance debugging shared memory multiprocessor applications. IEEE Trans. Parallel Distrib. Syst. 28-40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. GRAHAM, S., KESSLER, P., AND McKuSICK, M. 1982. gprof: A call graph execution profiler. SIGPLAN Not. 17, 6 (June), 120-126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. HALL, M., ANDERSON, J., AMARASINGHE, S., MURPHY, B., LIAO, S.-W., BUGNION, E., AND LAM, M. 1996. Maximizing multiprocessor performance with the SUIF compiler. IEEE Comput. 29, 12 (Dec.), 84-89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. JOHNSON, R., PEARSON, D., AND PINGALI, K. 1994. The program structure tree: Computing control regions in linear time. In Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation. ACM, New York, 171-185. Google ScholarGoogle Scholar
  13. MCCALPIN, J. D. 1995. Memory bandwidth and machine balance in high performance computers. IEEE Tech. Comm. Comput. Arch. Newslett. See also http://www.cs.virginia.edu/ stream.Google ScholarGoogle Scholar
  14. MIPS. 1990. UMIPS-V reference manual (pixie and pixstats). MIPS Computer Systems, Sunnyvale, Calif.Google ScholarGoogle Scholar
  15. REISER, J. F. AND SKUDLAREK, J. P. 1994. Program profiling problems, and a solution via machine language rewriting. SIGPLAN Not. 29, 1 (Jan.), 37-45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. ROSENBLUM, M., HERROD, S., WITCHEL, E., AND GUPTA, A. 1995. Complete computer simulation: The SimOS approach. IEEE Parallel Distrib. Tech. 3, 3 (Fall). Google ScholarGoogle ScholarCross RefCross Ref
  17. SITES, R. AND WITEK, R. 1995. Alpha AXP architecture reference manual. Digital Press, Newton, Mass. Google ScholarGoogle Scholar
  18. ZAGHA, M., LARSON, B., TURNER, S., AND ITZKOWITZ, M. 1996. Performance analysis using the MIPS R10000 performance counters. In Proceedings of Supercomputing. Google ScholarGoogle Scholar
  19. ZHANG, X., WANG, Z., GLOY, N., CHEN, J. B., AND SMITH, M. D. 1997. Operating system support for automated profiling and optimization. In Proceedings of the 16th ACM Symposium on Operating Systems Principles. ACM, New York. Google ScholarGoogle Scholar

Index Terms

  1. Continuous profiling: where have all the cycles gone?

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM Transactions on Computer Systems
                ACM Transactions on Computer Systems  Volume 15, Issue 4
                Nov. 1997
                92 pages
                ISSN:0734-2071
                EISSN:1557-7333
                DOI:10.1145/265924
                Issue’s Table of Contents

                Copyright © 1997 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 1 November 1997
                Published in tocs Volume 15, Issue 4

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • article

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader