skip to main content
article
Free Access

Analysis of benchmark characteristics and benchmark performance prediction

Published:01 November 1996Publication History
Skip Abstract Section

Abstract

Standard benchmarking provides to run-times for given programs on given machines, but fails to provide insight as to why those results were obtained (either in terms of machine or program characteristics) and fails to provide run-times for that program on some other machine, or some other programs on that machine. We have developed a machine-imdependent model of program execution to characterize both machine performance and program execution. By merging these machine and program characterizations, we can estimate execution time for arbitrary machine/program combinations. Our technique allows us to identify those operations, either on the machine or in the programs, which dominate the benchmark results. This information helps designers in improving the performance of future machines and users in tuning their applications to better utilize the performance of existing machines. Here we apply our methodology to characterize benchmarks and predict their execution times. We present extensive run-time statistics for a large set of benchmarks including the SPEC and Perfect Club suites. We show how these statistics can be used to identify important shortcoming in the programs. In addition, we give execution time estimates for a large sample of programs and machines and compare these against benchmark results. Finally, we develop a metric for program similarity that makes it possible to classify benchmarks with respect to a large set of characteristics.

References

  1. ALLEN, F., BURKE, M., CHARLES, P., CYTRON, R., AND FERRANTE, J. 1987. An overview of the PTRAN analysis system for multiprocessing. In Proceedings of the Supercomputing '87 Conference. ACM, New York. Google ScholarGoogle Scholar
  2. BACON, D. F., GRAHAM, S. L., AND SHARP, O.L. 1994. Compiler transformations for highperformance computing. ACM Comput. Surv. 26, 4 (Dec.), 345-420. Google ScholarGoogle Scholar
  3. BAILEY, D. H. AND BARTON, J.T. 1985. The NAS kernel benchmark program. NASA Tech. Memo. 86711, NASA, Ames, Iowa. Aug.Google ScholarGoogle Scholar
  4. BALASUNDARAM, V., FOX, G., KENNEDY, K., AND KREMER, U. 1991. A static performance estimator to guide data partitioning decisions. In the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, New York, 213-223. Google ScholarGoogle Scholar
  5. BALASUNDARAM, V., KENNEDY, K., KREMER, U., MCKINLEY, K., AND SUBHLOK, J. 1989. The ParaScope Editor: An interactive parallel programming tool. In Proceedings of the Supercomputing '89 Conference. ACM, New York. Google ScholarGoogle Scholar
  6. BEIZER, B. 1978. Micro Analysis of Computer System Performance. Van Nostrand, New York. Google ScholarGoogle Scholar
  7. CLAPP, R. M., DUCHESNEAU, L., VOLZ, R. A., MUDGE, T. N., AND SCHULTZE, T. 1986. Toward real-time performance benchmarks for ADA. Commun. ACM 29, 8 (Aug.), 760-778. Google ScholarGoogle Scholar
  8. CURNOW, H. J. AND WICHMANN, B.A. 1976. A synthetic benchmark. Comput. J. 19, 1 (Feb.), 43-49.Google ScholarGoogle Scholar
  9. CURRAH, B. 1975. Some causes of variability in CPU time. Comput. Meas. Eval. 3, 389-392.Google ScholarGoogle Scholar
  10. CYBENKO, a., KIPP, L., POINTER, L., AND KUCK, D. 1990. Supercomputer performance evaluation and the Perfect benchmarks. Tech. Rep. 965, Center for Supercomputing Research and Development, Univ. of Illinois, Urbana-Champaign, Ill. Mar.Google ScholarGoogle Scholar
  11. DODUC, N. 1989. Fortran execution time benchmark. Version 29. Departement Informatique, Framantec, France. Unpublished manuscript. Mar.Google ScholarGoogle Scholar
  12. DONGARRA, g.g. 1988. Performance of various computers using standard linear equations software in a Fortran environment. Comput. Arch. News 16, 1 (Mar.), 47-69. Google ScholarGoogle Scholar
  13. DONGARRA, J. J., MARTIN, J., AND WORLTON, J. 1987. Computer benchmarking: Paths and pitfalls. Computer 24, 7 (July), 38-43. Google ScholarGoogle Scholar
  14. GEE, J. AND SMITH, A.J. 1993. TLB performance of the SPEC benchmark suite. Univ. of California, Berkeley, Calif. Unpublished manuscript. Google ScholarGoogle Scholar
  15. GEE, J., HILL, M. D., PNEVMATIKATOS, D. N., AND SMITH, A.J. 1991. Cache performance of the SPEC benchmark suite. IEEE Micro 13, 4 (Aug.), 17-27. Early version appears as Tech. Rep. UCB/CSD 91/648, Dept. of Computer Science, Univ. of California, Berkeley, Calif., Sept. 1991. Google ScholarGoogle Scholar
  16. GROVES, R. D. AND OEHLER, R. 1990. RISC System/6000 processor architecture. In IBM RISC System~6000 Technology. SA23-2619, IBM Corp., Armonk, N.Y., 16-23.Google ScholarGoogle Scholar
  17. HICKEY, T. AND COHEN, J. 1988. Automatic program analysis. J. ACM 35, 1 (Jan.), 185-220. Google ScholarGoogle Scholar
  18. KNUTH, D. E. 1971. An empirical study of Fortran programs. Softw. Pract. Exper. 1, 105-133.Google ScholarGoogle Scholar
  19. MCMAHON, F.H. 1986. The Livermore Fortran kernels: A computer test of the floatingpoint performance range. Rep. UCRL-53745, Lawrence Livermore National Laboratories, Livermore, Calif. Dec.Google ScholarGoogle Scholar
  20. MIPS. 1989. MIPS UNIX Benchmarks. Perf. Brief CPUBenchmarks 3.8 (June).Google ScholarGoogle Scholar
  21. OLSSON, B., MONTOYE, R., MARKSTEIN, P., AND NGUYENPHU, M. 1990. RISC System/6000 floating-point unit. In RISC System~60000 Technology. SA23-2619, IBM Corp., Armonk, N.Y., 34-43.Google ScholarGoogle Scholar
  22. PUETO, B. L. AND SHUSTEK, L.J. 1977. An instruction timing model of CPU performance. In the 4th Annual Symposium on Computer Architecture. ACM, New York, 165-178. Google ScholarGoogle Scholar
  23. PONDER, C.G. 1990. An analytical look at linear performance models. Tech. Rep. UCRL-JC- 106105, Lawrence Livermore National Laboratories, Livermore, Calif. Sept.Google ScholarGoogle Scholar
  24. RAMAMOORTHY, C.V. 1965. Discrete Markov analysis of computer programs. In Proceedings of the ACM National Conference. ACM, New York, 386-392. Google ScholarGoogle Scholar
  25. SAAVEDRA, R. H. AND SMITH, A.J. 1992. Analysis of benchmark characteristics and benchmark performance prediction. Tech. Rep. USC-CS-92-524, Univ. of Southern California, Los Angeles. Extended version appears as Tech. Rep. UCB/CSD 92/715, Univ. of California, Berkeley, Calif., 1992, Dec. Google ScholarGoogle Scholar
  26. SAAVEDRA, R. H. AND SMITH, A.J. 1995a. Benchmarking optimizing compilers. IEEE Trans. Softw. Eng. 21, 7 (July), 615-628. Google ScholarGoogle Scholar
  27. SAAVEDRA, R. H. AND SMITH, A.J. 1995b. Measuring cache and TLB performance. IEEE Trans. Comput. 44, 10 (Oct.), 1223-1235. Google ScholarGoogle Scholar
  28. SAAVEDRA-BARRERA, R. H. 1988. Machine characterization and benchmark performance prediction. Tech. Rep. UCB/CSD 88/437, Univ. of California, Berkeley, Calif. June. Google ScholarGoogle Scholar
  29. SAAVEDRA-BARRERA, R. H. 1992. CPU performance and evaluation time prediction using narrow spectrum benchmarking. Ph.D. thesis, Tech. Rep. UCB/CSD 92/684, Univ. of California, Berkeley, Calif. Feb. Google ScholarGoogle Scholar
  30. SAAVEDRA-BARRERA, R. H. AND SMITH, A.J. 1990. Benchmarking and the abstract machine characterization model. Tech. Rep. UCB/CSD-90-607, Univ. of California, Berkeley, Calif. Nov.Google ScholarGoogle Scholar
  31. SAAVEDRA-BARRERA, R. H., SMITH, A. J., AND MIYA, E. 1989. Machine characterization based on an abstract high-level language machine. IEEE Trans. Comput. 38, 12 (Dec.), 1659-1679. Google ScholarGoogle Scholar
  32. SARKAR, V. 1989. Determining average program execution times and their variance. In Proceedings of the SIGPLAN '89 Conference on Programming Language Design and Implementation. ACM, New York, 298-312. Google ScholarGoogle Scholar
  33. SPEC. 1989a. SPEC Newslett. Benchmark Results 2, 1 (Winter).Google ScholarGoogle Scholar
  34. SPEC. 1989b. SPEC Newslett. Benchmark Results 2, 2 (Spring).Google ScholarGoogle Scholar
  35. SPEC. 1990a. SPEC Newslett. Benchmark Results 3, 1 (Winter).Google ScholarGoogle Scholar
  36. SPEC. 1990b. SPEC Newslett. Benchmark Results 3, 2 (Spring).Google ScholarGoogle Scholar
  37. UCB. 1987. SPICE2G.6. EECS/ERL Industrial Liaison Program, Univ. of California, Berkeley, Calif. Mar. Software.Google ScholarGoogle Scholar
  38. VON WORLEY, S. AND SMITH, A.J. 1995. Microbenchmarking and performance prediction for parallel computers. Tech. Rep. UCB/CSD-95-873, Univ. of California, Berkeley, Calif. May. Google ScholarGoogle Scholar
  39. WEICKER, R. P. 1988. Dhrystone benchmark: Rationale for version 2 and measurement rules. SIGPLAN Not. 23, 8 (Aug.). Google ScholarGoogle Scholar
  40. WORLTON, J. 1984. Understanding supercomputer benchmarks. Datamation 30, 14 (Sept. 1), 121-130.Google ScholarGoogle Scholar

Index Terms

  1. Analysis of benchmark characteristics and benchmark performance prediction

                Recommendations

                Reviews

                Guenter Haring

                A machine-independent model of program execution to characterize both machine performance and program execution is described. The model consists of a set of abstract operations representing the basic operators and language constructs in programs. A special benchmark (machine characterizer) is used to measure the time it takes to execute each abstract operation. Details of this procedure have been published in another paper. This paper focuses on program characterization and execution time prediction, using a large set of programs including the Perfect Club and SPEC benchmarks as well as various applications and synthetic benchmarks, all based on Fortran. After a short introduction, the paper presents an overview of the abstract model, including a description of its parts, among them the machine characterizer, program analyzer, and execution predictor, which allow a user to predict the performance of a program or machine. Various factors influencing the execution time, such as memory hierarchy, compiler optimization, and vectorization, are briefly discussed in this section. The investigations in this paper are based on unoptimized code and ignore memory hierarchy delays. More details on the extension of the basic abstract model to include these aspects are to be found in other publications by these authors. After the set of 28 programs and 18 machines is described, the predicted execution times for 244 program-machine combinations are compared to the real execution times. The predictions are good. The authors explain some of the larger errors. The main part of the paper is devoted to program characterization provided by the dynamic statistics of the programs, deepening readers' understanding of each benchmark and its relation to the machine. It constitutes a summary of the results, covering statistics on basic blocks and statements, arithmetic and logical operations, references to array and scalar variables, execution time distribution, dynamic distribution of basic blocks, distribution of abstract operations, the amount of skewness in programs, and the distribution of errors. Interesting observations and conclusions are provided, including similarities between benchmarks. Unfortunately, the authors do not say either how the clustering according to the similarity of distributions was done or how the number of clusters was determined. In the last section of the paper, the authors use the dynamic statistics of the benchmarks to define a metric of similarity among the programs. Two different metrics are presented and compared. The first metric is based on the expectation that programs which execute similar operations will produce similar runtime results, that is, similar mean estimates of processing speed for a given machine. On the other hand, benchmarks that yield proportional performance (real execution time) on a variety of machines should be considered similar. The investigations show that the two metrics are highly correlated. The paper should interest everyone involved in comparing the performance of various systems based on benchmarks or involved in workload modeling based on existing or synthetic benchmarks. To gain a complete understanding of this field, I recommend reading the authors' related publications, al though this paper is self-contained and well done.

                Access critical reviews of Computing literature here

                Become a reviewer for Computing Reviews.

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader