Abstract
As multicore computer systems become increasingly complex, parallel simulation is becoming an important tool for exploring design space and evaluating design tradeoffs. The key to the success of parallel simulation is the ability to maintain a high degree of parallelism under synchronization constraints. In this article, an enhanced Null-message algorithm called FNM is presented that uses domain-specific knowledge to improve the performance of the basic Null-message algorithm. Based on their runtime states, the components of the simulation model can make a conservative forecast of future interprocess events. The forecast information is carried in the enhanced Null-messages, and, by combining the forecast from both sides of an interprocess link, FNM can achieve a dynamic system lookahead that is much greater than what the static system structure provides. This improved lookahead allows better exploitation of the simulation model's inherent parallelism and leads to better performance. Compared with the basic Null-message algorithm, FNM greatly reduces the amount of Null-messages and improves parallel simulation performance as a result, while at the same time it guarantees simulation correctness as the basic Null-message algorithm does. In tests on cycle-level models with up to 128 cores, FNM shows good scalability and proves to be an effective method.
- C. Bienia and K. Li. 2009. PARSEC 2.0: A new benchmark suite for chip-multiprocessors. In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation.Google Scholar
- R. E. Bryant. 1977. Simulation of Packet Communications Architecture Computer Systems. Technical Report MIT-LCS-TR-188. Massachusetts Institute of Technology. Google ScholarDigital Library
- W. Cai and S. Turner. 1990. An algorithm for distributed discrete event simulation: The “carrier null message” approach. In Proceedings of the 1990 SCS Multiconference on Distributed Simulation. 3--8.Google Scholar
- K. M. Chandy and J. Misra. 1979. Distributed simulation: A case study in design and verification of distributed programs. IEEE Transactions on Software Engineering SE-5, 5 (1979), 440--452. Google ScholarDigital Library
- J. Chen, L. K. Dabbiru, D. Wong, M. Annavaram, and M. Dubois. 2010. Adaptive and speculative slack simulations of CMPs on CMPs. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture. 523--534. Google ScholarDigital Library
- M. Chidester and A. George. 2002. Parallel simulation of chip-multiprocessor architectures. ACM Transactions on Modeling and Computer Simulation 12, 3 (July 2002), 176--200. Google ScholarDigital Library
- M.-K. Chung and C.-M. Kyung. 2006. Improving lookahead in parallel multiprocessor simulation using dynamic execution path prediction. In Proceedings of the 20th Workshop on Principles of Advanced and Distributed Simulation. 11--18. Google ScholarDigital Library
- R. C. DeVries. 1990. Reducing null messages in Misra's distributed discrete event simulation method. IEEE Transactions on Software Engineering 16, 1 (January 1990), 82--91. Google ScholarDigital Library
- Z. Dong, J. Wang, G. Riley, and S. Yalamanchili. 2013. A study of the effect of partitioning on parallel simulation of multicore systems. In IEEE 21st International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS'13). 375--379. Google ScholarDigital Library
- Z. Dong, J. Wang, G. Riley, and S. Yalamanchili. 2014. An efficient front-end for timing-directed parallel simulation of multicore system. In Proceedings of the 7th International ICST Conference on Simulation Tools and Techniques (SIMUTools'14). Google ScholarDigital Library
- J. Duato, S. Yalamanchili, and L. Ni. 2003. Interconnection Networks, an Engineering Approach. Morgan Kaufmann. Google ScholarDigital Library
- R. M. Fujimoto. 1989. Performance measurements of distributed simulation strategies. Transactions of the Society for Computer Simulation 6, 2 (April 1989), 89--132.Google Scholar
- R. M. Fujimoto. 2000. Parallel and Distributed Simulation Systems. John Wiley & Sons. Google ScholarDigital Library
- J. L. Hennessy and D. A. Patterson. 2007. Computer Architecture: A Quantitative Approach (4th ed.). Morgan Kaufmann. Google ScholarDigital Library
- S. W. Keckler, K. Olukotun, and H. P. Hofstee (Eds.). 2009. Multicore Processors and Systems. Springer. Google ScholarDigital Library
- C. D. Kersey, A. Rodrigues, and S. Yalamanchili. 2012. A universal parallel front-end for execution driven microarchitecture simulation. In Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation Methods and Tools. 25--32. Google ScholarDigital Library
- L. Li and C. Tropper. 2009. A multiway design-driven partitioning algorithm for distributed verilog simulation. Simulation 85, 4 (April 2009), 257--270. Google ScholarDigital Library
- G. H. Loh, S. Subramaniam, and Y. Xie. 2009. Zesto: A cycle-level simulator for highly detailed microarchitecture exploration. In Proceedings of the International Symposium on Performance Analysis of Software and Systems. 53--64.Google Scholar
- J. Misra. 1986. Distributed discrete event simulation. Computer Surveys 18, 1 (March 1986), 39--65. Google ScholarDigital Library
- M. Papamarcos and J. Patel. 1984. A low-overhead coherence solution for multiprocessors with private cache memories. In Proceedings of the 11th Annual International Symposium on Computer Architecture. 348--354. Google ScholarDigital Library
- H. Park, H. Oh, and S. Ha. 2009. Multiprocessor SoC design methods and tools. IEEE Signal Processing Magazine (November 2009), 72--79.Google Scholar
- J. Pelkey and G. Riley. 2011. Distributed simulation with MPI in ns-3. In Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques. 410--414. Google ScholarDigital Library
- S. Reinhardt, M. Hill, J. Larus, A. Lebeck, J. Lewis, and D. Wood. 1993. The Wisconsin wind tunnel: Virtual prototyping of parallel computers. In Proceedings of the 1993 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. 48--60. Google ScholarDigital Library
- A. F. Rodrigues, K. S. Hemmert, B. W. Barrett, C. Kersey, R. Oldfield, M. Weston, R. Risen, J. Cook, P. Rosenfeld, E. CooperBalls, and B. Jacob. 2011. The structural simulation toolkit. ACM SIGMETRICS Performance Evaluation Review 38, 4 (March 2011), 37--42. Google ScholarDigital Library
- W.-K. Su and C. L. Seitz. 1988. Variants of the Chandy-Misra-Bryant Distributed Discrete-Event Simulation Algorithm. Technical Report Caltech-CS-TR-88-22. California Institute of Technology. Google ScholarDigital Library
- J. Wang, J. Beu, R. Bheda, T. Conte, Z. Dong, C. Kersey, M. Rasquinha, G. Riley, W. Song, H. Xiao, P. Xu, and S. Yalamanchili. 2014. Manifold: A parallel simulation framework for multicore systems. In Proceedings of the 2014 IEEE International Symposium on Performance w Analysis of Systems and Software (ISPASS'14). 106--115.Google Scholar
- J. Wang, J. Beu, S. Yalamanchili, and T. Conte. 2012. Designing configurable, modifiable and reusable components for simulation of multicore systems. In Proceedings of the 3rd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS'12). 472--476. Google ScholarDigital Library
Index Terms
- FNM: An Enhanced Null-Message Algorithm for Parallel Simulation of Multicore Systems
Recommendations
Optimizing parallel simulation of multicore systems using domain-specific knowledge
SIGSIM PADS '13: Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete SimulationThis paper presents two optimization techniques for the basic Null-message algorithm in the context of parallel simulation of multicore computer architectures. Unlike the general, application-independent optimization methods, these are application-...
Parallel discrete event simulation for DEVS cellular models using a GPU
HPC '12: Proceedings of the 2012 Symposium on High Performance ComputingThe discrete event systems specification (DEVS) simulation has been studied to analyze complex homogeneous systems which is represented by the cellular models. In the simulation of large-scale DEVS cellular model, it requires a high-performance ...
P-GAS: Parallelizing a Cycle-Accurate Event-Driven Many-Core Processor Simulator Using Parallel Discrete Event Simulation
PADS '10: Proceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed SimulationMulti-core processors are commonly available now, but most traditional computer architectural simulators still use single-thread execution. In this paper we use parallel discrete event simulation (PDES) to speedup a cycle-accurate event-driven many-core ...
Comments