research-article

Accelerating multi-core simulators

Authors:
Aparna Mandke Dani

Indian Institute of Science Bangalore, India

Indian Institute of Science Bangalore, India
View Profile

,
Keshavan Varadarajan

Indian Institute of Science Bangalore, India

Indian Institute of Science Bangalore, India
View Profile

,
Bharadwaj Amrutur

Indian Institute of Science Bangalore, India

Indian Institute of Science Bangalore, India
View Profile

,
Y. N. Srikant

Indian Institute of Science Bangalore, India

Indian Institute of Science Bangalore, India
View Profile

SAC '10: Proceedings of the 2010 ACM Symposium on Applied ComputingMarch 2010Pages 2377–2382https://doi.org/10.1145/1774088.1774582

Published:22 March 2010Publication History

SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing

Pages 2377–2382

ABSTRACT

Simulation is an important means of evaluating new microarchitectures. With the invention of multi-core (CMP) platforms, simulators are becoming larger and more complex. However, with the availability of CMPs with larger caches and higher operating frequency, the wall clock time required for simulating an application has become comparatively shorter. Reducing this simulation time further is a great challenge, especially in the case of multi-threaded workload due to indeterminacy introduced due to simultaneously executing various threads. In this paper, we propose a technique for speeding multi-core simulation. The model of the processor core and cache are replaced with functional models, to achieve speedup. A timed Petri net model is used to estimate the execution time of the processor and the memory access latencies are estimated using hit/miss information obtained from the functional model of the cache. This model can be used to predict performance of data parallel applications or multiprogramming workload on CMP platform with various cache hierarchies and shared bus interconnect. The error in estimation of the execution time of an application is within 6%. The speedup achieved ranges between an average of 2x--4x over the cycle accurate simulator.

References

R. S. C. Aamer Jaleel. Cmpsim: A pin-based on-the-fly multi-core cache simulator. Workshop on Modeling, Benchmarking and Simulation, 2008.Google Scholar
M. V. Biesbrouck, T. Sherwood, and B. Calder. A co-phase matrix to guide simultaneous multithreading simulation. In ISPASS '04: Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, pages 45--56, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarDigital Library
M. Chrystopher, A. Stanley, and F. jim. Compiled instruction set simulation. Software, Practice and Experience, 21(8), 1999.Google Scholar
J. Edler and M. Hill. Dinero trace-driven uniprocessor cache simulator.Google Scholar
S. A. M. Engin Ïpek, Bronis R. An approach to performance prediction for parallel applications. International Euro-Par Conference, 2005. Google ScholarDigital Library
L. Gao, K. Karuri, S. Kraemer, R. Leupers, G. Ascheid, and H. Meyr. Multiprocessor performance estimation using hybrid simulation. In DAC '08: Proceedings of the 45th annual conference on Design automation, pages 325--330, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
E. Ïpek, S. A. McKee, R. Caruana, B. R. de Supinski, and M. Schulz. Efficiently exploring architectural design spaces via predictive modeling. SIGOPS Oper. Syst. Rev., 40(5):195--206, 2006. Google ScholarDigital Library
D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA '81: Proceedings of the 8th annual symposium on Computer Architecture, pages 81--87, Los Alamitos, CA, USA, 1981. IEEE Computer Society Press. Google ScholarDigital Library
M.-L. Li, R. Sasanka, S. A.-K. Chen, and E. Debes. The alpbench benchmark suite for complex multimedia applications. In IEEE International Symposium on Workload Characterization, 2005.Google Scholar
A. Mandke, K. Varadarajan, A. Bharadwaj, and Y. N. Srikant. Accelerating multi-core simulator. Technical Report IISc-CSA-TR-2009-10, Computer Science and Automation, Indian Institute of Science, India, 2009. URL: http://csa.iisc.ernet.in/TR/2009/10/.Google Scholar
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Archit. News, 33(4):92--99, 2005. Google ScholarDigital Library
P. Marwedel. Embedded system design. Springer International Edition.Google Scholar
M. Monchiero, Ahn, J. Ho, Falconi, Ayose, Ortega, Daniel, Faraboschi, and Paolo. How to simulate 1000 cores. dasCMP Workshop, 2008.Google Scholar
E. Perelman, M. Polito, J. yves Bouguet, J. Sampson, B. Calder, and C. Dulong. Detecting phases in parallel applications on shared memory architectures. In In International Parallel and Distributed Processing Symposium, 2006. Google ScholarDigital Library
J. Renau, B. Fraguela, J. Tuck, W. Lui, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Struss, and P. Montesinos. Simulator for cmp architecture.Google Scholar
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In ASPLOS-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, pages 45--57, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
D. J. Sorin, V. S. Pai, S. V. Adve, M. K. Vernon, and D. A. Wood. Analytic evaluation of shared-memory systems with ilp processors. In ISCA '98: Proceedings of the 25th annual international symposium on Computer architecture, pages 380--391, Washington, DC, USA, 1998. IEEE Computer Society. Google ScholarDigital Library
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The splash-2 programs: characterization and methodological considerations. In ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture, pages 24--36, New York, NY, USA, 1995. ACM. Google ScholarDigital Library

Accelerating multi-core simulators
1. General and reference
  1. Cross-computing tools and techniques

Recommendations

GPU Accelerating for Rapid Multi-core Cache Simulation
IPDPSW '11: Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

To find the best memory system for emerging workloads, traces are obtained during application's execution, then caches with different configurations are simulated using these traces. Since program traces can be several gigabytes, simulation of cache ...
Read More
MCSim: A Multi-Core Cache Simulator Accelerated on a Resource-constrained FPGA
GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023

Performance evaluation of caches is an important component of the design process. Software or analytical model-based simulation approaches, although used by architects, are abstract models and are, therefore, not completely accurate. RTL-based ...
Read More
Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil

Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time-consuming, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing
March 2010
2712 pages
ISBN:9781605586397
DOI:10.1145/1774088
Conference Chairs:
Sung Y. Shin
South Dakota State University
,
Sascha Ossowski
University Rey Juan Carlos, Spain
,
Michael Schumacher
University of Applied Sciences Western Switzerland, Switzerland
,
Program Chairs:
Mathew J. Palakal
Indiana University Purdue University
,
Chih-Cheng Hung
Southern Polytechnic State University
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 March 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cache simulator
chip multi-core
instruction set simulator
multi-core platform
timed petri-nets
Qualifiers
- research-article
Conference

Acceptance Rates
SAC '10 Paper Acceptance Rate364of1,353submissions,27%Overall Acceptance Rate1,650of6,669submissions,25%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 228
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Accelerating multi-core simulators

SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Recommendations

GPU Accelerating for Rapid Multi-core Cache Simulation

MCSim: A Multi-Core Cache Simulator Accelerated on a Resource-constrained FPGA

Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil