ABSTRACT
Parallel Discrete Event Simulation (PDES) harnesses the power of parallel processing to improve the performance and capacity of simulation, supporting bigger models, in more details and for more scenarios. PDES engines are typically designed and evaluated assuming a homogeneous parallel computing system that is dedicated to the simulation application. In this paper, we first show that the presence of interference from other users, even a single process in an arbitrarily large parallel environment, can lead to dramatic slowdown in the performance of the simulation. We define a new metric, which we call proportional slowdown, that represents the idealized target for graceful slowdown in the presence of interference. We identify some of the reasons why simulators fall far short of proportional slowdown. Based on these observations, we design alternative simulation scheduling and mapping algorithms that are better able to tolerate interference. More precisely, the most resilient simulators will allow dynamic mapping of simulation event execution to processing resources (a work pool model). However, this model has significant overhead and can substantially impact locality. Thus, we propose a locality-aware adaptive dynamic-mapping (LADM) algorithm for PDES on multi-core systems. LADM reduces the number of active threads in the presence of interference, avoiding having threads disabled due to context switching. We show that LADM can substantially reduce the impact of interference while maintaining memory locality reducing the gap with proportional slowdown. LADM and similar techniques can also help in situations where there is load imbalance or processor heterogeneity.
- G. R. Andrews. Foundations of Multithreaded, Parallel, and Distributed Programming. Addison-Wesley, Nov. 1999. Google ScholarDigital Library
- K. Bahulkar, J. Wang, N. Abu-Ghazaleh, and D. Ponomarev. Partitioning on dynamic behavior for parallel discrete event simulation. In Principles of Advanced and Distributed Simulation (PADS), pages 221--230. IEEE, 2012. Google ScholarDigital Library
- C. Carothers, D. Bauer, and S. Pearce. ROSS: A high-performance, low memory, modular time warp system. In Principles of Advanced and Distributed Simulation (PADS), pages 53--60. IEEE, 2000. Google ScholarDigital Library
- C. Carothers, K. Perumalla, and R. Fujimoto. Efficient optimistic parallel simulations using reverse computation. ACM TOMACS, 1999. Google ScholarDigital Library
- C. D. Carothers and R. M. Fujimoto. Background execution of time warp programs. In Principles of Advanced and Distributed Simulation (PADS), pages 12--19. IEEE, 1996. Google ScholarDigital Library
- C. D. Carothers, R. M. Fujimoto, and Y.-B. Lin. A case study in simulating pcs networks using time warp. In Principles of Advanced and Distributed Simulation (PADS), pages 87--94. IEEE, 1995. Google ScholarDigital Library
- C.D.Carothers and R. M. Fujimoto. Efficient execution of time warp programs on heterogeneous, now platforms. IEEE Transactions on Parallel and Distributed Systems, 11:299--317, 2000. Google ScholarDigital Library
- R. Child and P. Wilsey. Dynamically adjusting core frequencies to accelerate time warp simulations in many-core processors. In Principles of Advanced and Distributed Simulation (PADS), pages 35--43. IEEE, 2012. Google ScholarDigital Library
- P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes. Cache hierarchy and memory subsystem of the amd opteron processor. IEEE Micro, 30(2):16--29, 2010. Google ScholarDigital Library
- R. Eduardo, D. Grande, and A. Boukerche. Dynamic load redistribution based on migration latency analysis for distributed virtual simulations. In Haptic Audio Visual Environments and Games (HAVE). IEEE, 2011.Google Scholar
- M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, pages 212--223, 1998. Google ScholarDigital Library
- R. Fujimoto. Parallel discrete event simulation. Communications of the ACM, 33(10):30--53, oct 1990. Google ScholarDigital Library
- R. Fujimoto. Performance of time warp under synthetic workloads. Proceedings of the SCS Multiconference on Distributed Simulation, 22(1):23--28, 1990.Google Scholar
- R. M. Fujimoto. Parallel and Distributed Simulation Systems. Wiley Interscience, Jan. 2000.Google ScholarDigital Library
- D. Glazer and C. Tropper. On process migration and load balancing in time warp. IEEE Transactions on Parallel and Distributed Systems, 4(3):318--327, 1993. Google ScholarDigital Library
- R. Gupta. The fuzzy barrier: a mechanism for high speed synchronization of processors. In Proc. ASPLOS, pages 54--63, 1989. Google ScholarDigital Library
- D. Jagtap, N.Abu-Ghazaleh, and D.Ponomarev. Optimization of parallel discrete event simulator for multi-core systems. In Proc. International Parallel and Distributed Processing Symposium (IPDPS), pages 520--531. IEEE, 2012. Google ScholarDigital Library
- M. Y. H. Low. Managing external workload with bsp time warp. In Proceedings of the 2002 Winter Simulation Conference. IEEE, 2002.Google ScholarCross Ref
- A. W. Malik, A.J.Park, and R. Fujimoto. Optimistic synchronization of parallel simulations in cloud computing environments. In Proceedings of the International Conference on Cloud Computing, pages 49--56. IEEE, 2009. Google ScholarDigital Library
- A. Nataraj, A. Morris, A. Malony, M. Sottile, and P. Beckman. The ghost in the machine: observing the effects of kernel operation on parallel application performance. In Proc. of ACM/IEEE Confernece on Supercomputing, pages 1--12. IEEE, 2007. Google ScholarDigital Library
- A. Palaniswamy and P. A. Wilsey. An analytical comparison of periodic checkpointing and incremental state saving. In Proc. of the 7th Workshop on Parallel and Distributed Simulation (PADS 93), pages 127--134. Society for Computer Simulation, July 1993. Google ScholarDigital Library
- F. Petrini, D. J. Kerbyson, and S. Pakin. The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of asci q. In Proc. of ACM/IEEE Confernece on Supercomputing, page 55. ACM, 2003. Google ScholarDigital Library
- P. Reiher and D. Jefferson. Virtual time based dynamic load management in the time warp operating system. In Proceedings of the SCS Multiconference on Distributed Simulation, pages 103--111, 1990.Google Scholar
- V. Sachdev, M. Hybinette, and E. Kraemer. Controlling over-optimism in time-warp via cpu-based flow control. In Proceedings of the 2004 Winter Simulation Conference. IEEE, 2004. Google ScholarDigital Library
- K. H. Shum. Replicating parallel simulation on heterogeneous clusters. Journal of Systems Architecture, 44:273--292, 1998. Google ScholarDigital Library
- S. C. Tay, Y. M. Teo, and S. T. Kong. Speculative parallel simulation with an adaptive throttle scheme. In Principles of Advanced and Distributed Simulation (PADS), pages 116--123. IEEE, 1997. Google ScholarDigital Library
- D. Tsafrir, Y. Etsion, D. Feitelson, and S. Kirkpatrick. System noise, os clock ticks, and fine-grained parallel applications. In Proc. of ACM/IEEE Confernece on Supercomputing, pages 303--312. ACM, 2005. Google ScholarDigital Library
- R. Vitali, A. Pellegrini, and F. Quaglia. Assessing load-sharing within optimistic simulation platforms. In Proceedings of the 2012 Winter Simulation Conference. IEEE, 2012. Google ScholarDigital Library
- R. Vitali, A. Pellegrini, and F. Quaglia. Towards symmetric multi-threaded optimistic simulation kernels. In Principles of Advanced and Distributed Simulation (PADS), pages 211--220. IEEE, 2012. Google ScholarDigital Library
- WarpIV Technologies (J. Steinman et al). The warpiv parallel simulation kernel version 1.5.2, 2008. Software available from http://www.warpiv.com/.Google Scholar
- S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In Proc. of ASPLOS, pages 129--142. ACM, 2010. Google ScholarDigital Library
Index Terms
- Interference resilient PDES on multi-core systems: towards proportional slowdown
Recommendations
AIR: Application-Level Interference Resilience for PDES on Multicore Systems
Parallel discrete event simulation (PDES) harnesses parallel processing to improve the performance and capacity of simulation, supporting bigger and more detailed models simulated for more scenarios. The presence of interference from other users can ...
Exploring many-core architecture design space for parallel discrete event simulation
SIGSIM PADS '14: Proceedings of the 2nd ACM SIGSIM Conference on Principles of Advanced Discrete SimulationAs multicore and manycore processor architectures are emerging and the core counts per chip continue to increase, it is important to evaluate and understand the performance and scalability of Parallel Discrete Event Simulation (PDES) on these platforms. ...
Optimizing performance of parallel programs on multicomputer and multi-core architectures: a comparative evaluation
ISTA '09: Proceedings of the 2009 conference on Information Science, Technology and ApplicationsWith the advent of multi-core architectures, there arises a need for comparative evaluations of the performance of well-understood parallel programs. This is because, it is necessary to gain an insight into the potential advantages of the available ...
Comments