ABSTRACT
Simulation is an important means of evaluating new microarchitectures. With the invention of multi-core (CMP) platforms, simulators are becoming larger and more complex. However, with the availability of CMPs with larger caches and higher operating frequency, the wall clock time required for simulating an application has become comparatively shorter. Reducing this simulation time further is a great challenge, especially in the case of multi-threaded workload due to indeterminacy introduced due to simultaneously executing various threads. In this paper, we propose a technique for speeding multi-core simulation. The model of the processor core and cache are replaced with functional models, to achieve speedup. A timed Petri net model is used to estimate the execution time of the processor and the memory access latencies are estimated using hit/miss information obtained from the functional model of the cache. This model can be used to predict performance of data parallel applications or multiprogramming workload on CMP platform with various cache hierarchies and shared bus interconnect. The error in estimation of the execution time of an application is within 6%. The speedup achieved ranges between an average of 2x--4x over the cycle accurate simulator.
- R. S. C. Aamer Jaleel. Cmpsim: A pin-based on-the-fly multi-core cache simulator. Workshop on Modeling, Benchmarking and Simulation, 2008.Google Scholar
- M. V. Biesbrouck, T. Sherwood, and B. Calder. A co-phase matrix to guide simultaneous multithreading simulation. In ISPASS '04: Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, pages 45--56, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarDigital Library
- M. Chrystopher, A. Stanley, and F. jim. Compiled instruction set simulation. Software, Practice and Experience, 21(8), 1999.Google Scholar
- J. Edler and M. Hill. Dinero trace-driven uniprocessor cache simulator.Google Scholar
- S. A. M. Engin Ïpek, Bronis R. An approach to performance prediction for parallel applications. International Euro-Par Conference, 2005. Google ScholarDigital Library
- L. Gao, K. Karuri, S. Kraemer, R. Leupers, G. Ascheid, and H. Meyr. Multiprocessor performance estimation using hybrid simulation. In DAC '08: Proceedings of the 45th annual conference on Design automation, pages 325--330, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- E. Ïpek, S. A. McKee, R. Caruana, B. R. de Supinski, and M. Schulz. Efficiently exploring architectural design spaces via predictive modeling. SIGOPS Oper. Syst. Rev., 40(5):195--206, 2006. Google ScholarDigital Library
- D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA '81: Proceedings of the 8th annual symposium on Computer Architecture, pages 81--87, Los Alamitos, CA, USA, 1981. IEEE Computer Society Press. Google ScholarDigital Library
- M.-L. Li, R. Sasanka, S. A.-K. Chen, and E. Debes. The alpbench benchmark suite for complex multimedia applications. In IEEE International Symposium on Workload Characterization, 2005.Google Scholar
- A. Mandke, K. Varadarajan, A. Bharadwaj, and Y. N. Srikant. Accelerating multi-core simulator. Technical Report IISc-CSA-TR-2009-10, Computer Science and Automation, Indian Institute of Science, India, 2009. URL: http://csa.iisc.ernet.in/TR/2009/10/.Google Scholar
- M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Archit. News, 33(4):92--99, 2005. Google ScholarDigital Library
- P. Marwedel. Embedded system design. Springer International Edition.Google Scholar
- M. Monchiero, Ahn, J. Ho, Falconi, Ayose, Ortega, Daniel, Faraboschi, and Paolo. How to simulate 1000 cores. dasCMP Workshop, 2008.Google Scholar
- E. Perelman, M. Polito, J. yves Bouguet, J. Sampson, B. Calder, and C. Dulong. Detecting phases in parallel applications on shared memory architectures. In In International Parallel and Distributed Processing Symposium, 2006. Google ScholarDigital Library
- J. Renau, B. Fraguela, J. Tuck, W. Lui, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Struss, and P. Montesinos. Simulator for cmp architecture.Google Scholar
- T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In ASPLOS-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, pages 45--57, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- D. J. Sorin, V. S. Pai, S. V. Adve, M. K. Vernon, and D. A. Wood. Analytic evaluation of shared-memory systems with ilp processors. In ISCA '98: Proceedings of the 25th annual international symposium on Computer architecture, pages 380--391, Washington, DC, USA, 1998. IEEE Computer Society. Google ScholarDigital Library
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The splash-2 programs: characterization and methodological considerations. In ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture, pages 24--36, New York, NY, USA, 1995. ACM. Google ScholarDigital Library
- Accelerating multi-core simulators
Recommendations
GPU Accelerating for Rapid Multi-core Cache Simulation
IPDPSW '11: Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD ForumTo find the best memory system for emerging workloads, traces are obtained during application's execution, then caches with different configurations are simulated using these traces. Since program traces can be several gigabytes, simulation of cache ...
MCSim: A Multi-Core Cache Simulator Accelerated on a Resource-constrained FPGA
GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023Performance evaluation of caches is an important component of the design process. Software or analytical model-based simulation approaches, although used by architects, are abstract models and are, therefore, not completely accurate. RTL-based ...
Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil
Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time-consuming, ...
Comments