Abstract
Standard benchmarking provides to run-times for given programs on given machines, but fails to provide insight as to why those results were obtained (either in terms of machine or program characteristics) and fails to provide run-times for that program on some other machine, or some other programs on that machine. We have developed a machine-imdependent model of program execution to characterize both machine performance and program execution. By merging these machine and program characterizations, we can estimate execution time for arbitrary machine/program combinations. Our technique allows us to identify those operations, either on the machine or in the programs, which dominate the benchmark results. This information helps designers in improving the performance of future machines and users in tuning their applications to better utilize the performance of existing machines. Here we apply our methodology to characterize benchmarks and predict their execution times. We present extensive run-time statistics for a large set of benchmarks including the SPEC and Perfect Club suites. We show how these statistics can be used to identify important shortcoming in the programs. In addition, we give execution time estimates for a large sample of programs and machines and compare these against benchmark results. Finally, we develop a metric for program similarity that makes it possible to classify benchmarks with respect to a large set of characteristics.
- ALLEN, F., BURKE, M., CHARLES, P., CYTRON, R., AND FERRANTE, J. 1987. An overview of the PTRAN analysis system for multiprocessing. In Proceedings of the Supercomputing '87 Conference. ACM, New York. Google Scholar
- BACON, D. F., GRAHAM, S. L., AND SHARP, O.L. 1994. Compiler transformations for highperformance computing. ACM Comput. Surv. 26, 4 (Dec.), 345-420. Google Scholar
- BAILEY, D. H. AND BARTON, J.T. 1985. The NAS kernel benchmark program. NASA Tech. Memo. 86711, NASA, Ames, Iowa. Aug.Google Scholar
- BALASUNDARAM, V., FOX, G., KENNEDY, K., AND KREMER, U. 1991. A static performance estimator to guide data partitioning decisions. In the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, New York, 213-223. Google Scholar
- BALASUNDARAM, V., KENNEDY, K., KREMER, U., MCKINLEY, K., AND SUBHLOK, J. 1989. The ParaScope Editor: An interactive parallel programming tool. In Proceedings of the Supercomputing '89 Conference. ACM, New York. Google Scholar
- BEIZER, B. 1978. Micro Analysis of Computer System Performance. Van Nostrand, New York. Google Scholar
- CLAPP, R. M., DUCHESNEAU, L., VOLZ, R. A., MUDGE, T. N., AND SCHULTZE, T. 1986. Toward real-time performance benchmarks for ADA. Commun. ACM 29, 8 (Aug.), 760-778. Google Scholar
- CURNOW, H. J. AND WICHMANN, B.A. 1976. A synthetic benchmark. Comput. J. 19, 1 (Feb.), 43-49.Google Scholar
- CURRAH, B. 1975. Some causes of variability in CPU time. Comput. Meas. Eval. 3, 389-392.Google Scholar
- CYBENKO, a., KIPP, L., POINTER, L., AND KUCK, D. 1990. Supercomputer performance evaluation and the Perfect benchmarks. Tech. Rep. 965, Center for Supercomputing Research and Development, Univ. of Illinois, Urbana-Champaign, Ill. Mar.Google Scholar
- DODUC, N. 1989. Fortran execution time benchmark. Version 29. Departement Informatique, Framantec, France. Unpublished manuscript. Mar.Google Scholar
- DONGARRA, g.g. 1988. Performance of various computers using standard linear equations software in a Fortran environment. Comput. Arch. News 16, 1 (Mar.), 47-69. Google Scholar
- DONGARRA, J. J., MARTIN, J., AND WORLTON, J. 1987. Computer benchmarking: Paths and pitfalls. Computer 24, 7 (July), 38-43. Google Scholar
- GEE, J. AND SMITH, A.J. 1993. TLB performance of the SPEC benchmark suite. Univ. of California, Berkeley, Calif. Unpublished manuscript. Google Scholar
- GEE, J., HILL, M. D., PNEVMATIKATOS, D. N., AND SMITH, A.J. 1991. Cache performance of the SPEC benchmark suite. IEEE Micro 13, 4 (Aug.), 17-27. Early version appears as Tech. Rep. UCB/CSD 91/648, Dept. of Computer Science, Univ. of California, Berkeley, Calif., Sept. 1991. Google Scholar
- GROVES, R. D. AND OEHLER, R. 1990. RISC System/6000 processor architecture. In IBM RISC System~6000 Technology. SA23-2619, IBM Corp., Armonk, N.Y., 16-23.Google Scholar
- HICKEY, T. AND COHEN, J. 1988. Automatic program analysis. J. ACM 35, 1 (Jan.), 185-220. Google Scholar
- KNUTH, D. E. 1971. An empirical study of Fortran programs. Softw. Pract. Exper. 1, 105-133.Google Scholar
- MCMAHON, F.H. 1986. The Livermore Fortran kernels: A computer test of the floatingpoint performance range. Rep. UCRL-53745, Lawrence Livermore National Laboratories, Livermore, Calif. Dec.Google Scholar
- MIPS. 1989. MIPS UNIX Benchmarks. Perf. Brief CPUBenchmarks 3.8 (June).Google Scholar
- OLSSON, B., MONTOYE, R., MARKSTEIN, P., AND NGUYENPHU, M. 1990. RISC System/6000 floating-point unit. In RISC System~60000 Technology. SA23-2619, IBM Corp., Armonk, N.Y., 34-43.Google Scholar
- PUETO, B. L. AND SHUSTEK, L.J. 1977. An instruction timing model of CPU performance. In the 4th Annual Symposium on Computer Architecture. ACM, New York, 165-178. Google Scholar
- PONDER, C.G. 1990. An analytical look at linear performance models. Tech. Rep. UCRL-JC- 106105, Lawrence Livermore National Laboratories, Livermore, Calif. Sept.Google Scholar
- RAMAMOORTHY, C.V. 1965. Discrete Markov analysis of computer programs. In Proceedings of the ACM National Conference. ACM, New York, 386-392. Google Scholar
- SAAVEDRA, R. H. AND SMITH, A.J. 1992. Analysis of benchmark characteristics and benchmark performance prediction. Tech. Rep. USC-CS-92-524, Univ. of Southern California, Los Angeles. Extended version appears as Tech. Rep. UCB/CSD 92/715, Univ. of California, Berkeley, Calif., 1992, Dec. Google Scholar
- SAAVEDRA, R. H. AND SMITH, A.J. 1995a. Benchmarking optimizing compilers. IEEE Trans. Softw. Eng. 21, 7 (July), 615-628. Google Scholar
- SAAVEDRA, R. H. AND SMITH, A.J. 1995b. Measuring cache and TLB performance. IEEE Trans. Comput. 44, 10 (Oct.), 1223-1235. Google Scholar
- SAAVEDRA-BARRERA, R. H. 1988. Machine characterization and benchmark performance prediction. Tech. Rep. UCB/CSD 88/437, Univ. of California, Berkeley, Calif. June. Google Scholar
- SAAVEDRA-BARRERA, R. H. 1992. CPU performance and evaluation time prediction using narrow spectrum benchmarking. Ph.D. thesis, Tech. Rep. UCB/CSD 92/684, Univ. of California, Berkeley, Calif. Feb. Google Scholar
- SAAVEDRA-BARRERA, R. H. AND SMITH, A.J. 1990. Benchmarking and the abstract machine characterization model. Tech. Rep. UCB/CSD-90-607, Univ. of California, Berkeley, Calif. Nov.Google Scholar
- SAAVEDRA-BARRERA, R. H., SMITH, A. J., AND MIYA, E. 1989. Machine characterization based on an abstract high-level language machine. IEEE Trans. Comput. 38, 12 (Dec.), 1659-1679. Google Scholar
- SARKAR, V. 1989. Determining average program execution times and their variance. In Proceedings of the SIGPLAN '89 Conference on Programming Language Design and Implementation. ACM, New York, 298-312. Google Scholar
- SPEC. 1989a. SPEC Newslett. Benchmark Results 2, 1 (Winter).Google Scholar
- SPEC. 1989b. SPEC Newslett. Benchmark Results 2, 2 (Spring).Google Scholar
- SPEC. 1990a. SPEC Newslett. Benchmark Results 3, 1 (Winter).Google Scholar
- SPEC. 1990b. SPEC Newslett. Benchmark Results 3, 2 (Spring).Google Scholar
- UCB. 1987. SPICE2G.6. EECS/ERL Industrial Liaison Program, Univ. of California, Berkeley, Calif. Mar. Software.Google Scholar
- VON WORLEY, S. AND SMITH, A.J. 1995. Microbenchmarking and performance prediction for parallel computers. Tech. Rep. UCB/CSD-95-873, Univ. of California, Berkeley, Calif. May. Google Scholar
- WEICKER, R. P. 1988. Dhrystone benchmark: Rationale for version 2 and measurement rules. SIGPLAN Not. 23, 8 (Aug.). Google Scholar
- WORLTON, J. 1984. Understanding supercomputer benchmarks. Datamation 30, 14 (Sept. 1), 121-130.Google Scholar
Index Terms
- Analysis of benchmark characteristics and benchmark performance prediction
Recommendations
How to Build a Benchmark
ICPE '15: Proceedings of the 6th ACM/SPEC International Conference on Performance EngineeringStandardized benchmarks have become widely accepted tools for the comparison of products and evaluation of methodologies. These benchmarks are created by consortia like SPEC and TPC under confidentiality agreements which provide little opportunity for ...
Subsetting the SPEC CPU2006 benchmark suite
On August 24, 2006, the Standard Performance Evaluation Corporation (SPEC) announced CPU2006 -- the next generation of industry-standardized CPU-intensive benchmark suite. The SPEC CPU benchmark suite has become the most frequently used suite for ...
Comments