research-article

Can PDES scale in environments with heterogeneous delays?

Authors:
Jingjing Wang

Binghamton University, Binghamton, NY, USA

Binghamton University, Binghamton, NY, USA
View Profile

,
Ketan Bahulkar

Binghamton University, Binghamton, NY, USA

Binghamton University, Binghamton, NY, USA
View Profile

,
Dmitry Ponomarev

Binghamton University, Binghamton, NY, USA

Binghamton University, Binghamton, NY, USA
View Profile

,
Nael Abu-Ghazaleh

Binghamton University, Binghamton, NY, USA

Binghamton University, Binghamton, NY, USA
View Profile

SIGSIM PADS '13: Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete SimulationMay 2013Pages 35–46https://doi.org/10.1145/2486092.2486098

Published:19 May 2013Publication History

SIGSIM PADS '13: Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

Pages 35–46

ABSTRACT

The performance and scalability of Parallel Discrete Event Simulation (PDES) is often limited by communication latencies and overheads. The emergence of multi-core processors and their expected evolution into many-cores offers the promise of low latency communication and tight memory integration between cores; these properties should significantly improve the performance of PDES in such environments. However, on clusters of multi-cores (CMs), the latency and processing overheads incurred when communicating between different machines (nodes) far outweigh those between cores on the same chip, especially when commodity networking fabrics and communication software are used. It is unclear if there is any benefit to the low latency among cores on the same node given that communication links across nodes are significantly worse. In this study, we examine the performance of a multi-threaded implementation of PDES on CMs. We demonstrate that the inter-node communication costs impose a substantial bottleneck on PDES and demonstrate that without optimizations addressing these long latencies, multi-threaded PDES does not significantly outperform the multiprocess version despite direct communication through shared memory on the individual nodes. We then propose three optimizations: message consolidation and routing, infrequent polling and latency-sensitive model partitioning. We show that with these optimizations in place, threaded implementation of PDES significantly outperforms process-based implementation even on CMs.

References

K. Bahulkar, J. Wang, N. Abu-Ghazaleh, and D. Ponomarev. Partitioning on dynamic behavior for parallel discrete event simulation. In Principles of Advanced and Distributed Simulation (PADS), pages 221--230. IEEE, 2012. Google ScholarDigital Library
M. L. Bailey, J. V. Briner, Jr., and R. D. Chamberlain. Parallel logic simulation of VLSI systems. ACM Computing Surveys, 26(3):255--294, sep 1994. Google ScholarDigital Library
D. Bauer, C. Carothers, and A. Holder. Scalable time warp on bluegene supercomputer. In Principles of Advanced and Distributed Simulation (PADS), pages 35--44, 2009. Google ScholarDigital Library
A. Boukerche and S. Das. Dynamic load balancing strategies for conservative parallel simulation. In Principles of Advanced and Distributed Simulation (PADS), pages 32--37, 1997. Google ScholarDigital Library
A. Canedo, T. Yoshizawa, and H.Komatsu. Automatic parallelization of simulink applications. In Proc. of CGO, pages 151--159, 2010. Google ScholarDigital Library
C. Carothers, D. Bauer, and S. Pearce. ROSS: A high-performance, low memory, modular time warp system. In Principles of Advanced and Distributed Simulation (PADS), pages 53--60. IEEE, 2000. Google ScholarDigital Library
C. D. Carothers, R. M. Fujimoto, and P. England. Effect of communication overheads on Time Warp performance: An experimental study. In Principles of Advanced and Distributed Simulation (PADS), pages 118--125, jul 1994. Google ScholarDigital Library
C. D. Carothers, R. M. Fujimoto, and Y.-B. Lin. A case study in simulating pcs networks using time warp. In Principles of Advanced and Distributed Simulation (PADS), pages 87--94. IEEE, 1995. Google ScholarDigital Library
C. Chen, J. Zhang, R. Cohen, and P.Ho. Secure and efficient trust opinion aggregation for vehicular ad-hoc networks. In Proc. of VTC, pages 1--5, 2010.Google ScholarCross Ref
L. Chen, Y. Lu, Y. Yao, S. Peng, and L. Wu. A well-balanced time warp system on multi-core environments. In Principles of Advanced and Distributed Simulation (PADS), pages 1--9. IEEE, 2011. Google ScholarDigital Library
M. Chetlur, N. Abu-Ghazaleh, R. Radhakrishnan, and P. A. Wilsey. Optimizing communication in Time-Warp simulators. In Principles of Advanced and Distributed Simulation (PADS), pages 64--71. IEEE, 1998. Google ScholarDigital Library
R. Child and P. Wilsey. Dynamically adjusting core frequencies to accelerate time warp simulations in many-core processors. In Principles of Advanced and Distributed Simulation (PADS), pages 35--43. IEEE, 2012. Google ScholarDigital Library
J. Cloutier. Model partitioning and the performance of distributed timewarp simulation of logic circuits. Simulation Practice and Theory, 5(1):83--99, 1997.Google ScholarCross Ref
J. Doi and Y. Negishi. Overlapping methods of all-to-all communication and FFT algorithms for torus-connected massively parallel supercomputers. In Proc. of Int'l Conference on Supercomputing, pages 1--9, 2010. Google ScholarDigital Library
K. El-Khatib and C. Tropper. On metrics for the dynamic load balancing of optimistic simulations. In Proc. 32nd Hawaii International Conference on Systems Science (HICCS), 1999. Google ScholarDigital Library
R. Fujimoto. Parallel discrete event simulation. Communications of the ACM, 33(10):30--53, oct 1990. Google ScholarDigital Library
R. Fujimoto. Performance of time warp under synthetic workloads. Proceedings of the SCS Multiconference on Distributed Simulation, 22(1):23--28, 1990.Google Scholar
D. Jagtap, K. Bahulkar, D.Ponomarev, and N.Abu-Ghazaleh. Characterizing and understanding pdes behavior on tilera architecture. In Principles of Advanced and Distributed Simulation (PADS), pages 53--62. IEEE, 2012. Google ScholarDigital Library
D. Jagtap, N.Abu-Ghazaleh, and D.Ponomarev. Optimization of parallel discrete event simulator for multi-core systems. In Parallel and Distributed Processing Symposium (IPDPS), pages 520--531. IEEE, 2012. Google ScholarDigital Library
G. Karypis and V. Kumar. hmetis: a hypergraph partitioning package. Available on WWW at URL: http://www.cs.umn.edu/ karypis/metis/hmetis.Google Scholar
K.Bahulkar, N.Hofmann, D.Jagtap, N.Abu-Ghazaleh, and D.Ponomarev. Performance evaluation of pdes on multicore clusters. In 14th IEEE/ACM International Symposium on Distributed Simulation and Real-Time Applications (DS-RT), pages 131--140, 2010. Google ScholarDigital Library
K.S.Perumalla. Scaling time warp-based discrete event execution to 104 processors on a blue gene supercomputer. In in Proceedings of the ACM Computing Frontiers, pages 69--76, 2007. Google ScholarDigital Library
L. Li and C. Tropper. A design-driven partitioning algorithm for distributed verilog simulation. In Principles of Advanced and Distributed Simulation (PADS), pages 211--218. IEEE, 2007. Google ScholarDigital Library
J. Liu, B. chandrasekaran, J. Wu, W. Jiang, S. Kini, W. Yu, D. Buntinas, P. Wyckoff, and D. Panda. Performance comparison of mpi implementations over infiniband, myrinet and quadrics. In Proc. of ACM/IEEE conference on Supercomputing, pages 58--71. IEEE, nov 2003. Google ScholarDigital Library
J. Liu and R. Rong. Hierarchical composite synchronization. In Principles of Advanced and Distributed Simulation (PADS), pages 3--12. IEEE, 2012. Google ScholarDigital Library
P. Peschlow, T. Honecker, and P. Martini. A flexible dynamic partitioning algorithm for optimistic distributed simulation. In Principles of Advanced and Distributed Simulation (PADS), pages 219--228. IEEE, 2007. Google ScholarDigital Library
R. Preissl, N. Wichmann, B. Long, J. Shalf, S. Ethier, and A. Koniges. Multithreaded global address space communication techniques for gyrokinetic fusion applications on ultra-scale platforms. In Proc. of Int'l Conference on Supercomputing, 2011. Google ScholarDigital Library
V. Sarkar and J. Hennessy. Compile-time partitioning and scheduling of parallel programs. In Proc. of the SIGPLAN Symposium on Compiler construction, pages 17--26, 1986. Google ScholarDigital Library
G. D. Sharma, N. B. Abu-Ghazaleh, U. V. Rajasekaran, and P. A. Wilsey. Optimizing message delivery in asynchronous distributed applications. In Proc. of Euro-Par, pages 1204--1208, 1998. Google ScholarDigital Library
G. D. Sharma, R. Radhakrishnan, U. V. Rajesekaran, N. B. Abu-Ghazaleh, and P. A. Wilsey. Time warp simulation on clumps. In Principles of Advanced and Distributed Simulation (PADS), pages 174--181, may 1999. Google ScholarDigital Library
R. Vitali, A. Pellegrini, and F. Quaglia. Assessing load-sharing within optimistic simulation platforms. In Proceedings of the 2012 Winter Simulation Conference. IEEE, 2012. Google ScholarDigital Library
R. Vitali, A. Pellegrini, and F. Quaglia. Towards symmetric multi-threaded optimistic simulation kernels. In Principles of Advanced and Distributed Simulation (PADS), pages 211--220. IEEE, 2012. Google ScholarDigital Library
J. Wang, D.Ponomarev, and N.Abu-Ghazaleh. Performance analysis of a multithreaded pdes simulator on multicore clusters. In Principles of Advanced and Distributed Simulation (PADS) (Short Paper), pages 93--95. IEEE, 2012. Google ScholarDigital Library

Index Terms

Can PDES scale in environments with heterogeneous delays?
1. Computing methodologies
  1. Modeling and simulation
    1. Simulation types and techniques
      1. Discrete-event simulation
      2. Massively parallel and high-performance simulations

Recommendations

Can MIC find its place in the field of PDES?: An Early Performance Evaluation of PDES Simulator on Intel Many Integrated Cores Coprocessor
DS-RT 2015: Proceedings of the 19th International Symposium on Distributed Simulation and Real Time Applications

The widespread utilization of many-core processors offers a good opportunity for Parallel Discrete Events Simulation (PDES) to obtain a better execution performance. As one of the newly introduced many-core processors, the Intel Xeon Phi coprocessor ...
Read More
Coordinator-master-worker model for efficient large scale network simulation
SimuTools '13: Proceedings of the 6th International ICST Conference on Simulation Tools and Techniques

In this work, we propose a coordinator-master-worker (CMW) model for medium to extra-large scale network simulation. The model supports distributed and parallel simulation for a heterogeneous computing node architecture with both multi-core CPUs and ...
Read More
PDES-A: Accelerators for Parallel Discrete Event Simulation Implemented on FPGAs
Special Issue on PADS 2017

In this article, we present experiences implementing a general Parallel Discrete Event Simulation (PDES) accelerator on a Field Programmable Gate Array (FPGA). The accelerator can be specialized to any particular simulation model by defining the object ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGSIM PADS '13: Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
May 2013
426 pages
ISBN:9781450319201
DOI:10.1145/2486092
General Chair:
Margaret L. Loper
Georgia Institute of Technology, USA
,
Program Chair:
Gabriel A. Wainer
Carleton University, Canada
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 May 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cluster of multi-cores
multi-thread
pdes
Qualifiers
- research-article
Conference

Acceptance Rates
SIGSIM PADS '13 Paper Acceptance Rate29of75submissions,39%Overall Acceptance Rate398of779submissions,51%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 169
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Can PDES scale in environments with heterogeneous delays?

SIGSIM PADS '13: Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Can MIC find its place in the field of PDES?: An Early Performance Evaluation of PDES Simulator on Intel Many Integrated Cores Coprocessor

Coordinator-master-worker model for efficient large scale network simulation

PDES-A: Accelerators for Parallel Discrete Event Simulation Implemented on FPGAs