skip to main content
research-article
Free Access

An Evaluation of High-Level Mechanistic Core Models

Published:25 August 2014Publication History
Skip Abstract Section

Abstract

Large core counts and complex cache hierarchies are increasing the burden placed on commonly used simulation and modeling techniques. Although analytical models provide fast results, they do not apply to complex, many-core shared-memory systems. In contrast, detailed cycle-level simulation can be accurate but also tends to be slow, which limits the number of configurations that can be evaluated. A middle ground is needed that provides for fast simulation of complex many-core processors while still providing accurate results.

In this article, we explore, analyze, and compare the accuracy and simulation speed of high-abstraction core models as a potential solution to slow cycle-level simulation. We describe a number of enhancements to interval simulation to improve its accuracy while maintaining simulation speed. In addition, we introduce the instruction-window centric (IW-centric) core model, a new mechanistic core model that bridges the gap between interval simulation and cycle-accurate simulation by enabling high-speed simulations with higher levels of detail. We also show that using accurate core models like these are important for memory subsystem studies, and that simple, naive models, like a one-IPC core model, can lead to misleading and incorrect results and conclusions in practical design studies. Validation against real hardware shows good accuracy, with an average single-core error of 11.1% and a maximum of 18.8% for the IW-centric model with a 1.5× slowdown compared to interval simulation.

References

  1. A. Adileh, C. Kaynak, P. Lotfi-Kamran, and S. Volos. 2012. CloudSuite on Flexus. Retrieved July 22, 2014, from http://parsa.epfl.ch/simflex/doc/CloudSuite-on-Flexus-isca12.pdf.Google ScholarGoogle Scholar
  2. E. K. Ardestani and J. Renau. 2013. ESESC: A fast multicore simulator using time-based sampling. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 448--459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Beckmann, N. Binkert, A. Saidi, J. Hestness, G. Black, K. Sewell, and D. Hower. 2011. The gem5 Simulator. Retrieved July 22, 2014, from http://www.gem5.org/dist/tutorials/isca_pres_2011.pdf.Google ScholarGoogle Scholar
  4. N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. 2011. The gem5 simulator. SIGARCH Computer Architecture News 39, 2, 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. 2006. The M5 simulator: Modeling networked systems. IEEE Micro 26, 52--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. E. Carlson, W. Heirman, K. V. Craeynest, and L. Eeckhout. 2014. BarrierPoint: Sampled simulation of multi-threaded applications. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2--12.Google ScholarGoogle Scholar
  7. T. E. Carlson, W. Heirman, and L. Eeckhout. 2011. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 52:1--52:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. E. Carlson, W. Heirman, and L. Eeckhout. 2013. Sampled simulation of multi-threaded applications. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS). 2--12.Google ScholarGoogle Scholar
  9. J. Chen, L. K. Dabbiru, D. Wong, M. Annavaram, and M. Dubois. 2010. Adaptive and speculative slack simulations of CMPs on CMPs. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 523--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. E. Chen and T. M. Aamodt. 2011. Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs. ACM Transactions on Architecture and Code Optimization 8, 3, 10:1--10:28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Chiou, D. Sunwoo, J. Kim, N. A. Patil, W. Reinhart, D. E. Johnson, J. Keefe, and H. Angepat. 2007. FPGA-accelerated simulation technologies (FAST): Fast, full-system, cycle-accurate simulators. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 249--261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Chou, B. Fahs, and S. Abraham. 2004. Microarchitecture optimizations for exploiting memory-level parallelism. In Proceedings of the International Symposium on Computer Architecture (ISCA). 76--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. S. Chung, E. Nurvitadhi, J. C. Hoe, B. Falsafi, and K. Mai. 2008. A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs. In Proceedings of the 16th International ACM/SIGDA Symposium on Field Programmable Gate Arrays (FPGA). 77--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Eeckhout, R. H. Bell Jr, B. Stougie, K. De Bosschere, and L. K. John. 2004. Control flow modeling in statistical simulation for accurate and efficient processor design studies. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA). 350--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Eeckhout, S. Nussbaum, J. E. Smith, and K. De Bosschere. 2003. Statistical simulation: Adding efficiency to the computer designer’s toolbox. IEEE Micro 23, 5, 26--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Emer, P. Ahuja, E. Borch, A. Klauser, C.-K. Luk, S. Manne, S. Mukherjee, H. Patil, S. Wallace, N. Binkert, R. Espasa, and T. Juan. 2002. Asim: A performance model framework. Computer 35, 2, 68--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. S. Emer and D. W. Clark. 1984. A characterization of processor performance in the VAX-11/780. In Proceedings of the 11th Annual International Symposium on Computer Architecture (ISCA). 301--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. G. Emma. 1997. Understanding some simple processor-performance limits. IBM Journal of Research and Development 41, 3, 215--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith. 2006. A performance counter architecture for computing accurate CPI components. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 175--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith. 2009. A mechanistic performance model for superscalar out-of-order processors. ACM Transactions on Computer Systems 27, 2, 42--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Fog. 2013. Instruction Tables: Lists of Instruction Latencies, Throughputs and Micro-Operation Breakdowns for Intel, AMD and VIA CPUs. Retrieved July 22, 2014, from http://www.agner.org/optimize/instruction_tables.pdf.Google ScholarGoogle Scholar
  22. D. Genbrugge, S. Eyerman, and L. Eeckhout. 2010. Interval simulation: Raising the level of abstraction in architectural simulation. In Proceedings of the 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA). 307--318.Google ScholarGoogle Scholar
  23. K. Ghose, A. Patel, F. Afram, H. Zheng, and J. Tringali. 2012. MARSS: Micro Architectural Systems Simulator. Retrieved July 22, 2014, from http://cloud.github.com/downloads/avadhpatel/marss/Marss_ISCA_2012_tutorial.pdf.Google ScholarGoogle Scholar
  24. A. Glew. 1998. MLP yes! ILP no! In Proceedings of the ASPLOS Wild and Crazy Idea Session.Google ScholarGoogle Scholar
  25. P. Greenhalgh. 2011. big.LITTLE Processing with ARM Cortex-A15 & Cortex-A7. ARM white paper.Google ScholarGoogle Scholar
  26. N. Hardavellas, S. Somogyi, T. F. Wenisch, R. E. Wunderlich, S. Chen, J. Kim, B. Falsafi, J. C. Hoe, and A. G. Nowatzyk. 2004. SimFlex: A fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture. SIGMETRICS Performance Evaluation Review 31, 4, 31--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Jaleel, R. S. Cohn, C.-K. Luk, and B. Jacob. 2008. CMP$im: A pin-based on-the-fly multi-core cache simulator. In Proceedings of the 4th Annual Workshop on Modeling, Benchmarking and Simulation (MoBS), co-located with ISCA 2008. 28--36.Google ScholarGoogle Scholar
  28. T. Karkhanis and J. E. Smith. 2004. A first-order superscalar processor model. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA). 338--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Krasnov, A. Schultz, J. Wawrzynek, G. Gibeling, and P.-Y. Droz. 2007. RAMP Blue: A message-passing manycore system in FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL). 54--61.Google ScholarGoogle ScholarCross RefCross Ref
  30. J. D. Little. 1961. A proof for the queuing formula: L = λ W. Operations Research 9, 3, 383--387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. Loh, S. Subramaniam, and Y. Xie. 2009. Zesto: A cycle-level simulator for highly detailed microarchitecture exploration. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’09). 53--64.Google ScholarGoogle Scholar
  32. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 190--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. E. Miller, H. Kasture, G. Kurian, C. Gruenwald III, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. 2010. Graphite: A distributed parallel simulator for multicores. In Proceedings of the 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA). 1--12.Google ScholarGoogle Scholar
  34. S. S. Mukherjee, S. K. Reinhardt, B. Falsafi, M. Litzkow, M. D. Hill, D. A. Wood, S. Huss-Lederman, and J. R. Larus. 2000. Wisconsin wind tunnel II: A fast, portable parallel architecture simulator. IEEE Concurrency 8, 4, 12--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Nussbaum and J. E. Smith. 2001. Modeling superscalar processors via statistical simulation. In Proceedings of the 10th International Conference on Parallel Architectures and Compilation Techniques (PACT). 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Oskin, F. Chong, and M. Farrens. 2000. HLS: Combining statistical and symbolic simulation to guide microprocessor designs. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA). 71--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Patel, F. Afram, S. Chen, and K. Ghose. 2011. MARSS×86: A full system simulator for ×86 CPUs. In Proceedings of the Design Automation Conference (DAC). 1050--1055. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. Pellauer, M. Adler, M. Kinsy, A. Parashar, and J. Emer. 2011. HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 406--417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. D. Sanchez and C. Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA). 475--486. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. 2002. Automatically characterizing large scale program behavior. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 45--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. T. Taha and D. Wills. 2008. An instruction throughput model of superscalar processors. IEEE Transactions on Computers 57, 3, 389--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. V. Uzelac and A. Milenkovic. 2009. Experiment flows and microbenchmarks for reverse engineering of branch predictor structures. In Proceedings of the 2009 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 207--217.Google ScholarGoogle Scholar
  43. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd International Symposium on Computer Architecture (ISCA). 24--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe. 2003. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA). 84--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. Zagha, B. Larson, S. Turner, and M. Itzkowitz. 1996. Performance analysis using the MIPS R10000 performance counters. In Proceedings of the 1996 ACM/IEEE Conference on Supercomputing (SC). Article No. 16. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Evaluation of High-Level Mechanistic Core Models

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Architecture and Code Optimization
            ACM Transactions on Architecture and Code Optimization  Volume 11, Issue 3
            October 2014
            298 pages
            ISSN:1544-3566
            EISSN:1544-3973
            DOI:10.1145/2658949
            Issue’s Table of Contents

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 25 August 2014
            • Accepted: 1 May 2014
            • Revised: 1 March 2014
            • Received: 1 December 2013
            Published in taco Volume 11, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader