skip to main content
research-article

AIR: Application-Level Interference Resilience for PDES on Multicore Systems

Published:16 April 2015Publication History
Skip Abstract Section

Abstract

Parallel discrete event simulation (PDES) harnesses parallel processing to improve the performance and capacity of simulation, supporting bigger and more detailed models simulated for more scenarios. The presence of interference from other users can lead to dramatic slowdown in the performance of the simulation. Interference is typically managed using operating system scheduling support (e.g., gang scheduling), a heavyweight approach with some drawbacks. We propose an application-level approach to interference resilience through alternative simulation scheduling and mapping algorithms. More precisely, the most resilient simulators allow dynamic mapping of simulation event execution to processing resources (a work pool model). However, this model has significant scheduling overhead and poor cache locality. Thus, we investigate using application-level interference mitigation where the application detects the presence of interference and reacts by changing the thread task allocation. Specifically, we propose a locality-aware adaptive dynamic mapping (LADM) algorithm that adjusts the number of active threads on the fly by detecting the presence of interference. LADM avoids having the application stall when threads are inactive due to context switching. We investigate different mechanisms for monitoring the level of interference and different approaches for remapping tasks. We show that LADM can substantially reduce the impact of interference while maintaining memory locality.

References

  1. D. F. Anat, D. G. Feitelson, A. Batat, G. Benhanokh, D. Er-el, Y. Etsion, A. Kavas, T. Klainer, and M. A. Volovic. 1999. The ParPar System: A Software MPP. High Performance Cluster Computing 1 (1999), 754--770.Google ScholarGoogle Scholar
  2. G. R. Andrews. 1999. Foundations of Multithreaded, Parallel, and Distributed Programming. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. 2010. A view of cloud computing. Commun. ACM 53, 4 (April 2010), 50--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. H. Arpaci, A. C. Dusseau, A. M. Vahdat, L. T. Liu, T. E. Anderson, and D. A. Patterson. 1995. The interaction of parallel and sequential workloads on a network of workstations. SIGMETRICS Perform. Eval. Rev. 23, 1 (May 1995), 267--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Bahulkar, J. Wang, N. Abu-Ghazaleh, and D. Ponomarev. 2012. Partitioning on dynamic behavior for parallel discrete event simulation. In Principles of Advanced and Distributed Simulation (PADS). IEEE, 221--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. D. Blumofe and C. E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46, 5 (Sept. 1999), 720--748. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Carothers, D. Bauer, and S. Pearce. 2000. ROSS: A high-performance, low memory, modular time warp system. In Principles of Advanced and Distributed Simulation (PADS). IEEE, 53--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Carothers, K. Perumalla, and R. Fujimoto. 1999. Efficient optimistic parallel simulations using reverse computation. ACM TOMACS (1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. D. Carothers and R. M. Fujimoto. 2000. Efficient execution of time warp programs on heterogeneous, NOW platforms. IEEE Trans. Parallel Distrib. Syst. 11 (2000), 299--317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. D. Carothers, R. M. Fujimoto, and Y.-B. Lin. 1995. A case study in simulating PCS networks using time warp. In Principles of Advanced and Distributed Simulation (PADS). IEEE, 87--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Child and P. Wilsey. 2012. Dynamically adjusting core frequencies to accelerate time warp simulations in many-core processors. In Proc. ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS). IEEE, 35--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes. 2010. Cache hierarchy and memory subsystem of the AMD Opteron processor. IEEE Micro 30, 2 (2010), 16--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. G. Feitelson and L. Rudolph. 1992. Gang scheduling performance benefits for fine-grain synchronization. J. Parallel Distrib. Comput. 16 (1992), 306--318.Google ScholarGoogle ScholarCross RefCross Ref
  15. M. Frigo, C. E. Leiserson, and K. H. Randall. 1998. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation. 212--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Fujimoto. 1990a. Parallel discrete event simulation. Commun. ACM 33, 10 (Oct. 1990), 30--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Fujimoto. 1990b. Performance of time warp under synthetic workloads. Proc. SCS Multiconference on Distributed Simulation 22, 1 (1990), 23--28.Google ScholarGoogle Scholar
  18. R. Fujimoto. 2000. Parallel and Distributed Simulation Systems. Wiley Interscience. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Gupta. 1989. The fuzzy barrier: A mechanism for high speed synchronization of processors. In Proc. ASPLOS. 54--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Jagtap, K. Bahulkar, D. Ponomarev, and N. Abu-Ghazaleh. 2012a. Characterizing and understanding PDES behavior on Tilera architecture. In Proceedings of the Workshop on Principles of Advanced and Distributed Simulation (PADS’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Jagtap, N. Abu-Ghazaleh, and D. Ponomarev. 2012b. Optimization of parallel discrete event simulator for multi-core systems. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’12). IEEE, 520--531. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Jefferson. 1985. Virtual Time. ACM Tran. Program. Lang. Syst. 7, 3 (July 1985), 405--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. A. Jette, A. B. Yoo, and M. Grondona. 2002. SLURM: Simple Linux utility for resource management. In Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP’03). Lecture Notes in Computer Science, Springer-Verlag, 44--60.Google ScholarGoogle Scholar
  24. M. T. Jones. 2009. Inside the Linux 2.6 Completely Fair Scheduler: Providing Fair Access to CPUs since 2.6.23. Retrieved from http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/.Google ScholarGoogle Scholar
  25. R. Koo and S. Toueg. 1987. Checkpointing and rollback-recovery for distributed systems. IEEE Trans. Software Eng. SE-13 (Jan. 1987), 23--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. W. Malik, A. J. Park, and R. M. Fujimoto. 2009. Optimistic synchronization of parallel simulations in cloud computing environments. In Proceedings of the International Conference on Cloud Computing. 49--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Palaniswamy and P. A. Wilsey. 1993. An analytical comparison of periodic checkpointing and incremental state saving. In Proceedings of the 7th Workshop on Parallel and Distributed Simulation (PADS’93). Society for Computer Simulation, 127--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. H. Shum. 1998. Replicating parallel simulation on heterogeneous clusters. J. Syst. Architecture 44 (1998), 273--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Steinman. 2008. The WarpIV Parallel Simulation Kernel version 1.5.2. Retrieved from http://www.warpiv.com/.Google ScholarGoogle Scholar
  30. S. C. Tay, Y. M. Teo, and S. T. Kong. 1997. Speculative parallel simulation with an adaptive throttle scheme. In Principles of Advanced and Distributed Simulation (PADS). IEEE, 116--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Tsafrir, Y. Etsion, D. Feitelson, and S. Kirkpatrick. 2005. System noise, OS clock ticks, and fine-grained parallel applications. In Proceedings of the ACM/IEEE Conference on Supercomputing. ACM, 303--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. J. Turner. 1998. Models of computation for parallel discrete event simulation. J. Syst. Architecture (March 1998), 395--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Vitali, A. Pellegrini, and F. Quaglia. 2012. Towards symmetric multi-threaded optimistic simulation kernels. In Principles of Advanced and Distributed Simulation (PADS’12). IEEE, 211--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Wang, N. Abu-Ghazaleh, and D. Ponomarev. 2013. Interference resilient PDES on multi-core systems: towards proportional slowdown. In Proceedings of the 2013 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM-PADS’13). 115--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Wang, D. Jagtap, N. Abu-Ghazaleh, and D. Ponomarev. 2014. Parallel discrete event simulation for multi-core systems: Analysis and optimization. IEEE Trans. Parallel Distrib. Syst. 25, 6 (2014), 1574--1584. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Wang, D. Ponomarev, and N. Abu-Ghazaleh. 2013. Can PDES scale in environments with heterogeneous delays? In Proceedings of the SIGSIM-PADS Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. Wiseman and D. G. Feitelson. 2003. Paired gang scheduling. IEEE Trans. Parallel Distrib. Syst. 14, 6 (2003), 581--592. DOI: http://dx.doi.org/10.1109/TPDS.2003.1206505 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. F. Xian, W. Srisa-an, and H. Jiang. 2008. Contention-aware Scheduler: Unlocking Execution Parallelism in Multithreaded Java Programs. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications. 163--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Srikanth B. Yoginath and Kalyan S. Perumalla. 2013. Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms. In Proceedings of the 6th International ICST Conference on Simulation Tools and Techniques (SimuTools’13). 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. G. Zheng. 2005. Achieving High Performance on Extremely Large Parallel Machines: Performance Prediction and Load Balancing. Ph.D. Dissertation. Champaign, IL. Advisor(s) Kale, Laxmikant V. AAI3202198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Zhuravlev, S. Blagodurov, and A. Fedorova. 2010. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of ASPLOS. ACM, 129--142. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. AIR: Application-Level Interference Resilience for PDES on Multicore Systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Modeling and Computer Simulation
        ACM Transactions on Modeling and Computer Simulation  Volume 25, Issue 3
        May 2015
        146 pages
        ISSN:1049-3301
        EISSN:1558-1195
        DOI:10.1145/2764453
        Issue’s Table of Contents

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 April 2015
        • Accepted: 1 December 2014
        • Revised: 1 June 2014
        • Received: 1 February 2014
        Published in tomacs Volume 25, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader