Abstract
Current software-based microarchitecture simulators are many orders of magnitude slower than the hardware they simulate. Hence, most microarchitecture design studies draw their conclusions from drastically truncated benchmark simulations that are often inaccurate and misleading. This article presents the Sampling Microarchitecture Simulation (SMARTS) framework as an approach to enable fast and accurate performance measurements of full-length benchmarks. SMARTS accelerates simulation by selectively measuring in detail only an appropriate benchmark subset. SMARTS prescribes a statistically sound procedure for configuring a systematic sampling simulation run to achieve a desired quantifiable confidence in estimates.Analysis of the SPEC CPU2000 benchmark suite shows that CPI and energy per instruction (EPI) can be estimated to within ±3% with 99.7% confidence by measuring fewer than 50 million instructions per benchmark. In practice, inaccuracy in microarchitectural state initialization introduces an additional uncertainty which we empirically bound to ∼2% for the tested benchmarks. Our implementation of SMARTS achieves an actual average error of only 0.64% on CPI and 0.59% on EPI for the tested benchmarks, running with average speedups of 35 and 60 over detailed simulation of 8-way and 16-way out-of-order processors, respectively.
- Agarwal, A., Hennessy, J., and Horowitz, M. 1988. Cache performance of operating system and multiprogramming workloads. ACM Trans. Comput. Syst. 6, 4, 393--431. Google Scholar
- Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th Annual International Symposium on Computer Architecture (June). Google Scholar
- Burger, D. and Austin, T. M. 1997. The SimpleScalar tool set, version 2.0. Tech. rep. 1342, (June) Computer Sciences Department, University of Wisconsin--Madison, WI.Google Scholar
- Burtscher, M. and Ganusov, I. 2004. Automatic synthesis of high-speed processor simulators. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (Dec). Google Scholar
- Cain, H. W., Lepak, K. M., Schwartz, B. A., and Lipasti, M. H. 2002. Precise and accurate processor simulation. In Workshop on Computer Architecture Evaluation Using Commercial Workloads, HPCA (Feb.).Google Scholar
- Chen, S. 2004. Direct SMARTS: Accelerating microarchitectural simulation through direct execution. MS Thesis (June) Electrical and Computer Engineering, Carnegie Mellon University.Google Scholar
- Conte, T. M., Hirsch, M. A., and Menezes, K. N. 1996. Reducing state loss for effective trace sampling of superscalar processors. In Proceedings of the 14th International Conference on Computer Design (Oct.). Google Scholar
- Easton, M. C. and Fagin, R. 1978. Cold-start vs. warm-start miss ratios. Comm. ACM 21, 10, 866--872. Google Scholar
- Eeckhout, L., Nussbaum, S., Smith, J. E., and Bosschere, K. D. 2003. Statistical simulation: Adding efficiency to the computer designer's toolbox. IEEE Micro 23, 5, 26--38. Google Scholar
- Eeckhout, L., Luo, Y., De Bosschere, K., and John, L. K. 2005. BLRL: Accurate and efficient warmup for sampled processor simulation. Comput. J. 48, 4, 451--459. Google Scholar
- Hardavellas, N., Somogyi, S., Wenisch, T. F., Wunderlich, R. E., Chen, S., Kim, J., Falsafi, B., Hoe, J. C., and Nowatzyk, A. G. 2004. SimFlex: A fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture. ACM SIGMETRICS Performance Evaluation Review (Mar.). Google Scholar
- Hamerly, G., Perelman, E., Lau, J., and Calder, B. 2005. SimPoint 3.0: Faster and more flexible program analysis. J. Instruct. Level Parallel. (Sept.).Google Scholar
- Haskins, J. W. and Skadron, K. 2001. Minimal Subset Evaluation: Rapid warm-up for simulated hardware state. In Proceedings of the 19th International Conference on Computer Design (Sept.). Google Scholar
- Haskins, J. W. and Skadron, K. 2003. Memory Reference Reuse Latency: Accelerated warmup for sampled microarchitecture simulation. In Proceedings of the International Symposium on the Performance Analysis of Systems and Software (Mar.). Google Scholar
- Hsu, W. C., Chen, H., and Yew, P. C. 2002. On the predictability of program behavior using different input data sets. In Workshop on Interaction between Compilers and Computer Architectures, (Feb.). Google Scholar
- Iyengar, V. S., Trevillyan, L. H., and Bose, P. 1996. Representative traces for processor models with infinite cache. In Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture (Feb.). Google Scholar
- Jain, R. K. 2001. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. Wiley-Interscience, New York, NY.Google Scholar
- Kessler, R. E., Hill, M. D., and Wood, D. A. 1991. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Trans. Comput. 43, 6, 664--675. Google Scholar
- Lafage, T. and Seznec, A. 2000. Choosing representative slices of program execution for microarchitecture simulations: A preliminary application to the data stream. In IEEE Workshop on Workload Characterization, ICCD (Sept.).Google Scholar
- Laha, S., Patel, J. H., and Iyer, R. K. 1988. Accurate low-cost methods for performance evaluation of cache memory systems. IEEE Trans. Comput. 37, 11, 1325--1336. Google Scholar
- Lau, J., Sampson, J., Perelman, E., Hamerly, G., and Calder, B. 2005. The strong correlation between code signatures and performance. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (Mar.). Google Scholar
- Lauterbach, G. 1994. Accelerating architectural simulation by parallel execution of trace samples. In Proceedings of the 27th Hawaii International Conference on System Sciences (Jan). Vol. 1: Architecture, 205--210.Google Scholar
- Penry, D. A., Vachharajani, M., and August, D. I. 2005. Rapid development of flexible validated processor models. In Proceedings of the Workshop on Modeling, Benchmarking, and Simulation, ISCA (Nov.).Google Scholar
- Reinhardt, S. K., Hill, M. D., Larus, J. R., Lebeck, A. R., Lewis, J. C., and Wood, D. A. 1993. The Wisconsin Wind Tunnel: Virtual prototyping of parallel computers. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (May). Google Scholar
- Sherwood, T., Perelman, E., Hamerly, G., and Calder, B. 2002. Automatically characterizing large scale program behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (Oct.). Google Scholar
- Smith, A. J. 1982. Cache memories. ACM Comput. Surv. 14, 3, 473--530. Google Scholar
- Van Biesbrouck, M., Eeckhout, L., and Calder, B. 2005. Efficient sampling startup for sampled processor simulation. In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers (Nov.). Google Scholar
- Wenisch, T. F., Wunderlich, R. E., Fasafi, B., and Hoe, J. C. 2006. Simulation sampling with Live-points. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (Mar.).Google Scholar
- Wenisch, T. F., Wunderlich, R. E., Ferdman, M., Ailamaki, A., Falsafi, B., and Hoe, J. C. 2006a. Statistical sampling of computer system simulation. IEEE Macro 26, 4 (July). Google Scholar
- Wood, D. A., Hill, M. D., and Kessler, R. E. 1991. A model for estimating trace-sample miss ratios. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (May). Google Scholar
- Wunderlich, R. E., Wenisch, T. F., Falsafi, B., and Hoe, J. C. 2004. An evaluation of stratified sampling of microarchitecture simulations. In Third Annual Workshop on Duplicating, Deconstructing, and Debunking, ISCA (June).Google Scholar
Index Terms
- Statistical sampling of microarchitecture simulation
Recommendations
Statistical sampling of microarchitecture simulation
IPDPS'06: Proceedings of the 20th international conference on Parallel and distributed processingCurrent software-based microarchitecture simulators are many orders of magnitude slower than the hardware they simulate. Hence, most microarchitecture design studies draw their conclusions from drastically truncated benchmark simulations that are often ...
Two-Level Hybrid Sampled Simulation of Multithreaded Applications
Sampled microarchitectural simulation of single-threaded applications is mature technology for over a decade now. Sampling multithreaded applications, on the other hand, is much more complicated. Not until very recently have researchers proposed ...
TurboSMARTS: accurate microarchitecture simulation sampling in minutes
Performance evaluation reviewRecent research proposes accelerating processor microarchitecture simulation through statistical sampling. Prior simulation sampling approaches construct accurate model state for each measurement by continuously warming large microarchitectural ...
Comments