skip to main content
article

Statistical sampling of microarchitecture simulation

Authors Info & Claims
Published:01 July 2006Publication History
Skip Abstract Section

Abstract

Current software-based microarchitecture simulators are many orders of magnitude slower than the hardware they simulate. Hence, most microarchitecture design studies draw their conclusions from drastically truncated benchmark simulations that are often inaccurate and misleading. This article presents the Sampling Microarchitecture Simulation (SMARTS) framework as an approach to enable fast and accurate performance measurements of full-length benchmarks. SMARTS accelerates simulation by selectively measuring in detail only an appropriate benchmark subset. SMARTS prescribes a statistically sound procedure for configuring a systematic sampling simulation run to achieve a desired quantifiable confidence in estimates.Analysis of the SPEC CPU2000 benchmark suite shows that CPI and energy per instruction (EPI) can be estimated to within ±3% with 99.7% confidence by measuring fewer than 50 million instructions per benchmark. In practice, inaccuracy in microarchitectural state initialization introduces an additional uncertainty which we empirically bound to ∼2% for the tested benchmarks. Our implementation of SMARTS achieves an actual average error of only 0.64% on CPI and 0.59% on EPI for the tested benchmarks, running with average speedups of 35 and 60 over detailed simulation of 8-way and 16-way out-of-order processors, respectively.

References

  1. Agarwal, A., Hennessy, J., and Horowitz, M. 1988. Cache performance of operating system and multiprogramming workloads. ACM Trans. Comput. Syst. 6, 4, 393--431. Google ScholarGoogle Scholar
  2. Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th Annual International Symposium on Computer Architecture (June). Google ScholarGoogle Scholar
  3. Burger, D. and Austin, T. M. 1997. The SimpleScalar tool set, version 2.0. Tech. rep. 1342, (June) Computer Sciences Department, University of Wisconsin--Madison, WI.Google ScholarGoogle Scholar
  4. Burtscher, M. and Ganusov, I. 2004. Automatic synthesis of high-speed processor simulators. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (Dec). Google ScholarGoogle Scholar
  5. Cain, H. W., Lepak, K. M., Schwartz, B. A., and Lipasti, M. H. 2002. Precise and accurate processor simulation. In Workshop on Computer Architecture Evaluation Using Commercial Workloads, HPCA (Feb.).Google ScholarGoogle Scholar
  6. Chen, S. 2004. Direct SMARTS: Accelerating microarchitectural simulation through direct execution. MS Thesis (June) Electrical and Computer Engineering, Carnegie Mellon University.Google ScholarGoogle Scholar
  7. Conte, T. M., Hirsch, M. A., and Menezes, K. N. 1996. Reducing state loss for effective trace sampling of superscalar processors. In Proceedings of the 14th International Conference on Computer Design (Oct.). Google ScholarGoogle Scholar
  8. Easton, M. C. and Fagin, R. 1978. Cold-start vs. warm-start miss ratios. Comm. ACM 21, 10, 866--872. Google ScholarGoogle Scholar
  9. Eeckhout, L., Nussbaum, S., Smith, J. E., and Bosschere, K. D. 2003. Statistical simulation: Adding efficiency to the computer designer's toolbox. IEEE Micro 23, 5, 26--38. Google ScholarGoogle Scholar
  10. Eeckhout, L., Luo, Y., De Bosschere, K., and John, L. K. 2005. BLRL: Accurate and efficient warmup for sampled processor simulation. Comput. J. 48, 4, 451--459. Google ScholarGoogle Scholar
  11. Hardavellas, N., Somogyi, S., Wenisch, T. F., Wunderlich, R. E., Chen, S., Kim, J., Falsafi, B., Hoe, J. C., and Nowatzyk, A. G. 2004. SimFlex: A fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture. ACM SIGMETRICS Performance Evaluation Review (Mar.). Google ScholarGoogle Scholar
  12. Hamerly, G., Perelman, E., Lau, J., and Calder, B. 2005. SimPoint 3.0: Faster and more flexible program analysis. J. Instruct. Level Parallel. (Sept.).Google ScholarGoogle Scholar
  13. Haskins, J. W. and Skadron, K. 2001. Minimal Subset Evaluation: Rapid warm-up for simulated hardware state. In Proceedings of the 19th International Conference on Computer Design (Sept.). Google ScholarGoogle Scholar
  14. Haskins, J. W. and Skadron, K. 2003. Memory Reference Reuse Latency: Accelerated warmup for sampled microarchitecture simulation. In Proceedings of the International Symposium on the Performance Analysis of Systems and Software (Mar.). Google ScholarGoogle Scholar
  15. Hsu, W. C., Chen, H., and Yew, P. C. 2002. On the predictability of program behavior using different input data sets. In Workshop on Interaction between Compilers and Computer Architectures, (Feb.). Google ScholarGoogle Scholar
  16. Iyengar, V. S., Trevillyan, L. H., and Bose, P. 1996. Representative traces for processor models with infinite cache. In Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture (Feb.). Google ScholarGoogle Scholar
  17. Jain, R. K. 2001. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. Wiley-Interscience, New York, NY.Google ScholarGoogle Scholar
  18. Kessler, R. E., Hill, M. D., and Wood, D. A. 1991. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Trans. Comput. 43, 6, 664--675. Google ScholarGoogle Scholar
  19. Lafage, T. and Seznec, A. 2000. Choosing representative slices of program execution for microarchitecture simulations: A preliminary application to the data stream. In IEEE Workshop on Workload Characterization, ICCD (Sept.).Google ScholarGoogle Scholar
  20. Laha, S., Patel, J. H., and Iyer, R. K. 1988. Accurate low-cost methods for performance evaluation of cache memory systems. IEEE Trans. Comput. 37, 11, 1325--1336. Google ScholarGoogle Scholar
  21. Lau, J., Sampson, J., Perelman, E., Hamerly, G., and Calder, B. 2005. The strong correlation between code signatures and performance. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (Mar.). Google ScholarGoogle Scholar
  22. Lauterbach, G. 1994. Accelerating architectural simulation by parallel execution of trace samples. In Proceedings of the 27th Hawaii International Conference on System Sciences (Jan). Vol. 1: Architecture, 205--210.Google ScholarGoogle Scholar
  23. Penry, D. A., Vachharajani, M., and August, D. I. 2005. Rapid development of flexible validated processor models. In Proceedings of the Workshop on Modeling, Benchmarking, and Simulation, ISCA (Nov.).Google ScholarGoogle Scholar
  24. Reinhardt, S. K., Hill, M. D., Larus, J. R., Lebeck, A. R., Lewis, J. C., and Wood, D. A. 1993. The Wisconsin Wind Tunnel: Virtual prototyping of parallel computers. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (May). Google ScholarGoogle Scholar
  25. Sherwood, T., Perelman, E., Hamerly, G., and Calder, B. 2002. Automatically characterizing large scale program behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (Oct.). Google ScholarGoogle Scholar
  26. Smith, A. J. 1982. Cache memories. ACM Comput. Surv. 14, 3, 473--530. Google ScholarGoogle Scholar
  27. Van Biesbrouck, M., Eeckhout, L., and Calder, B. 2005. Efficient sampling startup for sampled processor simulation. In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers (Nov.). Google ScholarGoogle Scholar
  28. Wenisch, T. F., Wunderlich, R. E., Fasafi, B., and Hoe, J. C. 2006. Simulation sampling with Live-points. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (Mar.).Google ScholarGoogle Scholar
  29. Wenisch, T. F., Wunderlich, R. E., Ferdman, M., Ailamaki, A., Falsafi, B., and Hoe, J. C. 2006a. Statistical sampling of computer system simulation. IEEE Macro 26, 4 (July). Google ScholarGoogle Scholar
  30. Wood, D. A., Hill, M. D., and Kessler, R. E. 1991. A model for estimating trace-sample miss ratios. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (May). Google ScholarGoogle Scholar
  31. Wunderlich, R. E., Wenisch, T. F., Falsafi, B., and Hoe, J. C. 2004. An evaluation of stratified sampling of microarchitecture simulations. In Third Annual Workshop on Duplicating, Deconstructing, and Debunking, ISCA (June).Google ScholarGoogle Scholar

Index Terms

  1. Statistical sampling of microarchitecture simulation

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader