Abstract
Cloud and Virtual Machine (VM) technologies present new challenges with respect to performance and monetary cost in executing parallel discrete event simulation (PDES) applications. Due to the introduction of overall cost as a metric, the traditional use of the highest-end computing configuration is no longer the most obvious choice. Moreover, the unique runtime dynamics and configuration choices of Cloud and VM platforms introduce new design considerations and runtime characteristics specific to PDES over Cloud/VMs. Here, an empirical study is presented to help understand the dynamics, trends, and trade-offs in executing PDES on Cloud/VM platforms. Performance and cost measures obtained from multiple PDES applications executed on the Amazon EC2 Cloud and on a high-end VM host machine reveal new, counterintuitive VM--PDES dynamics and guidelines. One of the critical aspects uncovered is the fundamental mismatch in hypervisor scheduler policies designed for general Cloud workloads versus the virtual time ordering needed for PDES workloads. This insight is supported by experimental data revealing the gross deterioration in PDES performance traceable to VM scheduling policy. To overcome this fundamental problem, the design and implementation of a new deadlock-free scheduler algorithm are presented, optimized specifically for PDES applications on VMs. The scalability of our scheduler has been tested in up to 128 VMs multiplexed on 32 cores, showing significant improvement in the runtime relative to the default Cloud/VM scheduler. The observations, algorithmic design, and results are timely for emerging Cloud/VM-based installations, highlighting the need for PDES-specific support in high-performance discrete event simulations on Cloud/VM platforms.
- David Chisnall. 2007. The Definitive Guide to the Xen Hypervisor. Pearson Education, Inc., Prentice-Hall, Upper Saddle, NJ. Google ScholarDigital Library
- Edward G. Coffman, Melanie Elphick, and Arie Shoshani. 1971. System deadlocks. ACM Computing Surveys (CSUR) 3, 2, 67--78. Google ScholarDigital Library
- G. D’Angelo. 2011. Parallel and distributed simulation from many cores to the public cloud. In Proceedings of the 2011 International Conference on High Performance Computing and Simulation (HPCS). 14--23. DOI:http://dx.doi.org/10.1109/HPCSim.2011.5999802Google ScholarCross Ref
- R. M. Fujimoto. 1990. Performance of time warp under synthetic workloads. In Proceedings of 22nd SCS Multiconference on Distributed Simulation.Google Scholar
- Richard M. Fujimoto, Asad Waqar Malik, and A. Park. 2010. Parallel and distributed simulation in the cloud. SCS M&S Magazine 3, 1--10.Google Scholar
- K. R. Jackson, L. Ramakrishnan, K. Muriki, S. Canon, S. Cholia, J. Shalf, Harvey J. Wasserman, and N. J. Wright. 2010. Performance analysis of high performance computing applications on the Amazon web services cloud. In Proceedings of the 2010 IEEE 2nd International Conference on Cloud Computing Technology and Science (CloudCom). 159--168. DOI:http://dx.doi.org/10.1109/CloudCom.2010.69 Google ScholarDigital Library
- D. Jefferson, B. Beckman, F. Wieland, L. Blume, and M. Diloreto. 1987. Time warp operating system. SIGOPS Operating Systems Review 21, 5, 77--93. DOI:http://dx.doi.org/10.1145/37499.37508 Google ScholarDigital Library
- A. W. Malik, A. Park, and R. M. Fujimoto. 2009. Optimistic synchronization of parallel simulations in cloud computing environments. In Proceedings of the IEEE International Conference on Cloud Computing. 49--56. DOI:http://dx.doi.org/10.1109/CLOUD.2009.79 Google ScholarDigital Library
- A. W. Malik, A. Park, and R. M. Fujimoto. 2010. An optimistic parallel simulation protocol for cloud computing environments. SCS M&S Magazine 4, 1--9.Google Scholar
- Jeanna N. Matthews, Eli M. Dow, Todd Deshane, Wenjin Hu, Jeremy Bongio, Patrick F. Wilbur, and Brendan Johnson. 2008. Running Xen: A Hands-On Guide to the Art of Virtualization. Prentice-Hall, Upper Saddle, NJ. Google ScholarDigital Library
- Peter Mell and Timothy Grance. 2011. The NIST definition of cloud computing (draft). NIST Special Publication 800, 145, 7. Google ScholarDigital Library
- Alfred J. Park. 2009. Master/Worker Parallel Discrete Event Simulation. Ph.D. Dissertation. Georgia Institute of Technology, Atlanta, GA. Google ScholarDigital Library
- Kalyan S. Perumalla. 2005. μsik—a micro-kernel for parallel/distributed simulation systems. In Proceedings of the Workshop on Principles of Advanced and Distributed Simulation (PADS’05). IEEE, 59--68. Google ScholarDigital Library
- Kalyan S. Perumalla and Sudip K. Seal. 2012. Discrete event modeling and massively parallel execution of epidemic outbreak phenomena. Simulation 88, 7, 768--783. Google ScholarDigital Library
- Kurt Vanmechelen, Silas De Munck, and Jan Broeckhove. 2012. Conservative distributed discrete event simulation on Amazon EC2. In Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID’12). IEEE Computer Society, Washington, DC, 853--860. DOI:http://dx.doi.org/10.1109/CCGrid.2012.73 Google ScholarDigital Library
- Guohui Wang and T. S. Eugene Ng. 2010. The impact of virtualization on network performance of Amazon EC2 data center. In Proceedings of the 29th Conference on Information Communications (INFOCOM’10). IEEE Press, Piscataway, NJ, 1163--1171. http://dl.acm.org/citation.cfm?id=1833515.1833691 Google ScholarDigital Library
- Srikanth B. Yoginath and Kalyan S. Perumalla. 2008. Parallel vehicular traffic simulation using reverse computation-based optimistic execution. In Proceedings of the 22nd Workshop on Principles of Advanced and Distributed Simulation (PADS’08). IEEE, 33--42. Google ScholarDigital Library
- Srikanth B. Yoginath and Kalyan S. Perumalla. 2009. Reversible discrete event formulation and optimistic parallel execution of vehicular traffic models. International Journal of Simulation and Process Modelling 5, 2 (2009), 104--119.Google ScholarCross Ref
- Srikanth B. Yoginath and Kalyan S. Perumalla. 2011. Efficiently scheduling multi-core guest virtual machines on multi-core hosts in network simulation. In Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation (PADS’11). IEEE, 1--9. Google ScholarDigital Library
- Srikanth B. Yoginath and Kalyan S. Perumalla. 2013a. Empirical evaluation of conservative and optimistic discrete event execution on cloud and VM platforms. In Proceedings of the 2013 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM-PADS’13). ACM, New York, NY, 201--210. DOI:http://dx.doi.org/10.1145/2486092.2486118 Google ScholarDigital Library
- Srikanth B. Yoginath and Kalyan S. Perumalla. 2013b. Optimized hypervisor scheduler for parallel discrete event simulations on virtual machine platforms. In Proceedings of the 6th International ICST Conference on Simulation Tools and Techniques (SimuTools’13). Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, Brussels, Belgium, 1--9. http://dl.acm.org/citation.cfm?id=2512734.2512735 Google ScholarDigital Library
- Srikanth B. Yoginath, Kalyan S. Perumalla, and Brian J. Henz. 2012. Taming wild horses: The need for virtual time-based scheduling of VMs in network simulations. In Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS’’12). IEEE, 68--77. Google ScholarDigital Library
- Yuhao Zheng and David M. Nicol. 2011. A virtual time system for openvz-based network emulations. In Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation (PADS’11). IEEE, 1--10. Google ScholarDigital Library
Index Terms
- Efficient Parallel Discrete Event Simulation on Cloud/Virtual Machine Platforms
Recommendations
Optimized hypervisor scheduler for parallel discrete event simulations on virtual machine platforms
SimuTools '13: Proceedings of the 6th International ICST Conference on Simulation Tools and TechniquesWith the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is ...
Live gang migration of virtual machines
HPDC '11: Proceedings of the 20th international symposium on High performance distributed computingThis paper addresses the problem of simultaneously migrating a group of co-located and live virtual machines (VMs), i.e, VMs executing on the same physical machine. We refer to such a mass simultaneous migration of active VMs as "live gang migration". ...
The Continuity of Out-of-Band Remote Management across Virtual Machine Migration in Clouds
UCC '14: Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud ComputingIn Infrastructure-as-a-Service (IaaS) clouds, users remotely manage the systems in virtual machines (VMs) called user VMs, e.g., Through VNC. To allow users to manage their VMs even on failures inside the VMs, IaaS usually provides out-of-band remote ...
Comments