Abstract
Parallel discrete event simulation (PDES) harnesses parallel processing to improve the performance and capacity of simulation, supporting bigger and more detailed models simulated for more scenarios. The presence of interference from other users can lead to dramatic slowdown in the performance of the simulation. Interference is typically managed using operating system scheduling support (e.g., gang scheduling), a heavyweight approach with some drawbacks. We propose an application-level approach to interference resilience through alternative simulation scheduling and mapping algorithms. More precisely, the most resilient simulators allow dynamic mapping of simulation event execution to processing resources (a work pool model). However, this model has significant scheduling overhead and poor cache locality. Thus, we investigate using application-level interference mitigation where the application detects the presence of interference and reacts by changing the thread task allocation. Specifically, we propose a locality-aware adaptive dynamic mapping (LADM) algorithm that adjusts the number of active threads on the fly by detecting the presence of interference. LADM avoids having the application stall when threads are inactive due to context switching. We investigate different mechanisms for monitoring the level of interference and different approaches for remapping tasks. We show that LADM can substantially reduce the impact of interference while maintaining memory locality.
- D. F. Anat, D. G. Feitelson, A. Batat, G. Benhanokh, D. Er-el, Y. Etsion, A. Kavas, T. Klainer, and M. A. Volovic. 1999. The ParPar System: A Software MPP. High Performance Cluster Computing 1 (1999), 754--770.Google Scholar
- G. R. Andrews. 1999. Foundations of Multithreaded, Parallel, and Distributed Programming. Addison-Wesley. Google ScholarDigital Library
- M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. 2010. A view of cloud computing. Commun. ACM 53, 4 (April 2010), 50--58. Google ScholarDigital Library
- R. H. Arpaci, A. C. Dusseau, A. M. Vahdat, L. T. Liu, T. E. Anderson, and D. A. Patterson. 1995. The interaction of parallel and sequential workloads on a network of workstations. SIGMETRICS Perform. Eval. Rev. 23, 1 (May 1995), 267--278. Google ScholarDigital Library
- K. Bahulkar, J. Wang, N. Abu-Ghazaleh, and D. Ponomarev. 2012. Partitioning on dynamic behavior for parallel discrete event simulation. In Principles of Advanced and Distributed Simulation (PADS). IEEE, 221--230. Google ScholarDigital Library
- C. Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University. Google ScholarDigital Library
- R. D. Blumofe and C. E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46, 5 (Sept. 1999), 720--748. Google ScholarDigital Library
- C. Carothers, D. Bauer, and S. Pearce. 2000. ROSS: A high-performance, low memory, modular time warp system. In Principles of Advanced and Distributed Simulation (PADS). IEEE, 53--60. Google ScholarDigital Library
- C. Carothers, K. Perumalla, and R. Fujimoto. 1999. Efficient optimistic parallel simulations using reverse computation. ACM TOMACS (1999). Google ScholarDigital Library
- C. D. Carothers and R. M. Fujimoto. 2000. Efficient execution of time warp programs on heterogeneous, NOW platforms. IEEE Trans. Parallel Distrib. Syst. 11 (2000), 299--317. Google ScholarDigital Library
- C. D. Carothers, R. M. Fujimoto, and Y.-B. Lin. 1995. A case study in simulating PCS networks using time warp. In Principles of Advanced and Distributed Simulation (PADS). IEEE, 87--94. Google ScholarDigital Library
- R. Child and P. Wilsey. 2012. Dynamically adjusting core frequencies to accelerate time warp simulations in many-core processors. In Proc. ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS). IEEE, 35--43. Google ScholarDigital Library
- P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes. 2010. Cache hierarchy and memory subsystem of the AMD Opteron processor. IEEE Micro 30, 2 (2010), 16--29. Google ScholarDigital Library
- D. G. Feitelson and L. Rudolph. 1992. Gang scheduling performance benefits for fine-grain synchronization. J. Parallel Distrib. Comput. 16 (1992), 306--318.Google ScholarCross Ref
- M. Frigo, C. E. Leiserson, and K. H. Randall. 1998. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation. 212--223. Google ScholarDigital Library
- R. Fujimoto. 1990a. Parallel discrete event simulation. Commun. ACM 33, 10 (Oct. 1990), 30--53. Google ScholarDigital Library
- R. Fujimoto. 1990b. Performance of time warp under synthetic workloads. Proc. SCS Multiconference on Distributed Simulation 22, 1 (1990), 23--28.Google Scholar
- R. Fujimoto. 2000. Parallel and Distributed Simulation Systems. Wiley Interscience. Google ScholarDigital Library
- R. Gupta. 1989. The fuzzy barrier: A mechanism for high speed synchronization of processors. In Proc. ASPLOS. 54--63. Google ScholarDigital Library
- D. Jagtap, K. Bahulkar, D. Ponomarev, and N. Abu-Ghazaleh. 2012a. Characterizing and understanding PDES behavior on Tilera architecture. In Proceedings of the Workshop on Principles of Advanced and Distributed Simulation (PADS’12). Google ScholarDigital Library
- D. Jagtap, N. Abu-Ghazaleh, and D. Ponomarev. 2012b. Optimization of parallel discrete event simulator for multi-core systems. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’12). IEEE, 520--531. Google ScholarDigital Library
- D. Jefferson. 1985. Virtual Time. ACM Tran. Program. Lang. Syst. 7, 3 (July 1985), 405--425. Google ScholarDigital Library
- M. A. Jette, A. B. Yoo, and M. Grondona. 2002. SLURM: Simple Linux utility for resource management. In Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP’03). Lecture Notes in Computer Science, Springer-Verlag, 44--60.Google Scholar
- M. T. Jones. 2009. Inside the Linux 2.6 Completely Fair Scheduler: Providing Fair Access to CPUs since 2.6.23. Retrieved from http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/.Google Scholar
- R. Koo and S. Toueg. 1987. Checkpointing and rollback-recovery for distributed systems. IEEE Trans. Software Eng. SE-13 (Jan. 1987), 23--31. Google ScholarDigital Library
- A. W. Malik, A. J. Park, and R. M. Fujimoto. 2009. Optimistic synchronization of parallel simulations in cloud computing environments. In Proceedings of the International Conference on Cloud Computing. 49--56. Google ScholarDigital Library
- A. Palaniswamy and P. A. Wilsey. 1993. An analytical comparison of periodic checkpointing and incremental state saving. In Proceedings of the 7th Workshop on Parallel and Distributed Simulation (PADS’93). Society for Computer Simulation, 127--134. Google ScholarDigital Library
- K. H. Shum. 1998. Replicating parallel simulation on heterogeneous clusters. J. Syst. Architecture 44 (1998), 273--292. Google ScholarDigital Library
- J. Steinman. 2008. The WarpIV Parallel Simulation Kernel version 1.5.2. Retrieved from http://www.warpiv.com/.Google Scholar
- S. C. Tay, Y. M. Teo, and S. T. Kong. 1997. Speculative parallel simulation with an adaptive throttle scheme. In Principles of Advanced and Distributed Simulation (PADS). IEEE, 116--123. Google ScholarDigital Library
- D. Tsafrir, Y. Etsion, D. Feitelson, and S. Kirkpatrick. 2005. System noise, OS clock ticks, and fine-grained parallel applications. In Proceedings of the ACM/IEEE Conference on Supercomputing. ACM, 303--312. Google ScholarDigital Library
- S. J. Turner. 1998. Models of computation for parallel discrete event simulation. J. Syst. Architecture (March 1998), 395--409. Google ScholarDigital Library
- R. Vitali, A. Pellegrini, and F. Quaglia. 2012. Towards symmetric multi-threaded optimistic simulation kernels. In Principles of Advanced and Distributed Simulation (PADS’12). IEEE, 211--220. Google ScholarDigital Library
- J. Wang, N. Abu-Ghazaleh, and D. Ponomarev. 2013. Interference resilient PDES on multi-core systems: towards proportional slowdown. In Proceedings of the 2013 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM-PADS’13). 115--126. Google ScholarDigital Library
- J. Wang, D. Jagtap, N. Abu-Ghazaleh, and D. Ponomarev. 2014. Parallel discrete event simulation for multi-core systems: Analysis and optimization. IEEE Trans. Parallel Distrib. Syst. 25, 6 (2014), 1574--1584. Google ScholarDigital Library
- J. Wang, D. Ponomarev, and N. Abu-Ghazaleh. 2013. Can PDES scale in environments with heterogeneous delays? In Proceedings of the SIGSIM-PADS Conference. Google ScholarDigital Library
- Y. Wiseman and D. G. Feitelson. 2003. Paired gang scheduling. IEEE Trans. Parallel Distrib. Syst. 14, 6 (2003), 581--592. DOI: http://dx.doi.org/10.1109/TPDS.2003.1206505 Google ScholarDigital Library
- F. Xian, W. Srisa-an, and H. Jiang. 2008. Contention-aware Scheduler: Unlocking Execution Parallelism in Multithreaded Java Programs. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications. 163--180. Google ScholarDigital Library
- Srikanth B. Yoginath and Kalyan S. Perumalla. 2013. Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms. In Proceedings of the 6th International ICST Conference on Simulation Tools and Techniques (SimuTools’13). 1--9. Google ScholarDigital Library
- G. Zheng. 2005. Achieving High Performance on Extremely Large Parallel Machines: Performance Prediction and Load Balancing. Ph.D. Dissertation. Champaign, IL. Advisor(s) Kale, Laxmikant V. AAI3202198. Google ScholarDigital Library
- S. Zhuravlev, S. Blagodurov, and A. Fedorova. 2010. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of ASPLOS. ACM, 129--142. Google ScholarDigital Library
Index Terms
- AIR: Application-Level Interference Resilience for PDES on Multicore Systems
Recommendations
Interference resilient PDES on multi-core systems: towards proportional slowdown
SIGSIM PADS '13: Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete SimulationParallel Discrete Event Simulation (PDES) harnesses the power of parallel processing to improve the performance and capacity of simulation, supporting bigger models, in more details and for more scenarios. PDES engines are typically designed and ...
Commutation Signaling to Combat ISI over Nakagami and Ricean Fading
Bandwidth expanding signaling with quasi-orthogonal short codes is considered. The evaluation of the average probability of error analytically and by simulation is presented, for channels with multipath induced intersymbol interference (ISI) and for two ...
Performance Analysis of High Data Rate MIMO Systems in Frequency-Selective Fading Channels
The performance of multicode direct-sequence spread-spectrum multiple-input multiple-output (MIMO) systems in the presence of frequency-selective fading is evaluated. We derive the asymptotic distribution of the multiple-antenna interference when the ...
Comments