research-article

Warp speed: executing time warp on 1,966,080 cores

Authors:
Peter D. Barnes

Lawrence Livermore National Laboratory, Livermore, CA, USA

Lawrence Livermore National Laboratory, Livermore, CA, USA
View Profile

,
Christopher D. Carothers

Rensselaer Polytechnic Institute, Troy, NY, USA

Rensselaer Polytechnic Institute, Troy, NY, USA
View Profile

,
David R. Jefferson

Lawrence Livermore National Laboratory, Livermore, CA, USA

Lawrence Livermore National Laboratory, Livermore, CA, USA
View Profile

,
Justin M. LaPre

Rensselaer Polytechnic Institute, Troy, NY, USA

Rensselaer Polytechnic Institute, Troy, NY, USA
View Profile

SIGSIM PADS '13: Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete SimulationMay 2013Pages 327–336https://doi.org/10.1145/2486092.2486134

Published:19 May 2013Publication History

SIGSIM PADS '13: Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

Pages 327–336

ABSTRACT

Time Warp is an optimistic synchronization protocol for parallel discrete event simulation that coordinates the available parallelism through its rollback and antimessage mechanisms. In this paper we present the results of a strong scaling study of the ROSS simulator running Time Warp with reverse computation and executing the well-known PHOLD benchmark on Lawrence Livermore National Laboratory's Sequoia Blue Gene/Q supercomputer. The benchmark has 251 million PHOLD logical processes and was executed in several configurations up to a peak of 7.86 million MPI tasks running on 1,966,080 cores. At the largest scale it processed 33 trillion events in 65 seconds, yielding a sustained speed of 504 billion events/second using 120 racks of Sequoia. This is by far the highest event rate reported by any parallel discrete event simulation to date, whether running PHOLD or any other benchmark. Additionally, we believe it is likely to be the largest number of MPI tasks ever used in any computation of any kind to date.

ROSS exhibited a super-linear speedup throughout the strong scaling study, with more than a 97x speed improvement from scaling the number of cores by only 60x (from 32,768 to 1,966,080). We attribute this to significant cache-related performance acceleration as we moved to higher scales with fewer LPs per core.

Prompted by historical performance results we propose a new, long term performance metric called Warp Speed that grows logarithmically with the PHOLD event rate. As we define it our maximum speed of 504 billion PHOLD events/sec corresponds to Warp 2.7.

We suggest that the results described here are significant because they demonstrate that direct simulation of planetary-scale discrete event models are now, in principle at least, within reach.

References

D. W. Bauer and C. D. Carothers. Eliminating remote message passing in optimistic simulation. In WSC '06: Proceedings of the 38th conference on Winter simulation. Winter Simulation Conference, December 2006. Google ScholarDigital Library
D. W. Bauer Jr., C. D. Carothers, and A. Holder. Scalable time warp on blue gene supercomputers. In Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation, pages 35--44, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarDigital Library
P. Beckman, K. Iskra, K. Yoshii, S. Coghlan, and A. Nataraj. Benchmarking the Effects of Operating System Interference on Extreme-Scale Parallel Machines. Cluster Comput., 11:3--16, 2008. Google ScholarDigital Library
C. D. Carothers, D. Bauer, and S. Pearce. Ross: A high-performance, low-memory, modular time warp system. Journal of Parallel and Distributed Computing, 62(11):1648 -- 1669, 2002.Google ScholarDigital Library
C. D. Carothers and K. S. Perumalla. On deciding between conservative and optimistic approaches on massively parallel platforms. In Winter Simulation Conference'10, pages 678--687, 2010. Google ScholarDigital Library
C. D. Carothers, K. S. Perumalla, and R. M. Fujimoto. Efficient optimistic parallel simulations using reverse computation. ACM Transactions on Modeling and Computer Simulation, 9(3):224--253, 1999. Google ScholarDigital Library
D. Chen, N. Eisley, P. Heidelberger, S. Kumar, A. Mamidala, F. Petrini, R. Senger, Y. Sugawara, R. Walkup, B. Steinmacher-Burow, A. Choudhury, Y. Sabharwal, S. Singhal, and J. J. Parker. Looking under the hood of the ibm blue gene/q network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 69:1--69:12, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press. Google ScholarDigital Library
D. Chen, N. A. Eisley, P. Heidelberger, R. M. Senger, Y. Sugawara, S. Kumar, V. Salapura, D. L. Satterfield, B. Steinmacher-Burow, and J. J. Parker. The ibm blue gene/q interconnection network and message unit. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 26:1--26:10, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
G. Chen and B. K. Szymanski. Dsim: scaling time warp to 1,033 processors. In WSC '05: Proceedings of the 37th conference on Winter simulation, pages 346--355. Winter Simulation Conference, 2005. Google ScholarDigital Library
G. Chen and B. K. Szymanski. Time quantum GVT: A scalable computation of the global virtual time in parallel discrete event simulations. Scalable Computing: Practice and Experience: Scientific International Journal for Parallel and Distributed Computing, pages 425--446, 2007.Google Scholar
G. Chiu, P. Coteus, and R. Wisniewski. Blue gene/q overview and update. http://www.alcf.anl.gov/sites/www.alcf.anl.gov/files/IBM_BGQ_Architecture_0.pdf, 2011.Google Scholar
C. C. Foster. Information retrieval: information storage and retrieval using avl trees. In Proceedings of the 1965 20th national conference, ACM '65, pages 192--205, New York, NY, USA, 1965. ACM. Google ScholarDigital Library
R. M. Fujimoto. Performance of time warp under synthetic workloads, January 1990.Google Scholar
R. M. Fujimoto and K. S. Panesar. Buffer management in shared-memory time warp systems. In Proceedings of the ninth workshop on Parallel and distributed simulation, PADS '95, pages 149--156, Washington, DC, USA, 1995. IEEE Computer Society. Google ScholarDigital Library
R. M. Fujimoto, K. Perumalla, A. Park, H. Wu, M. H. Ammar, and G. F. Riley. Large-scale network simulation -- how big? how fast. In In Symposium on Modeling, Analysis and Simulation of Computer Telecommunication Systems (MASCOTS, 2003.Google ScholarCross Ref
E. Gonsiorowski, C. Carothers, and C. Tropper. Modeling large scale circuits using massively parallel discrete-event simulation. In Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), 2012 IEEE 20th International Symposium on, pages 127--133, Aug. Google ScholarDigital Library
A. G. Greenberg, B. D. Lubachevsky, P. E. Wright, and D. M. Nicol. Efficient massively parallel simulation of dynamic channel assignment schemes for wireless cellular communications. In Workshop on Parallel and Distributed Simulation, pages 187--194, 1994. Google ScholarDigital Library
F. Hao, K. Wilson, R. Fujimoto, and E. Zegura. Logical process size in parallel simulations. In Proceedings of the 28th conference on Winter simulation, WSC '96, pages 645--652, Washington, DC, USA, 1996. IEEE Computer Society. Google ScholarDigital Library
A. Holder and C. D. Carothers. Analysis of time warp on a 32,768 processor ibm blue gene/l supercomputer. In 2008 Proceedings European Modeling and Simulation Symposium (EMSS), 2008.Google Scholar
D. R. Jefferson. Virtual time. ACM Trans. Program. Lang. Syst., 7(3):404--425, 1985. Google ScholarDigital Library
S. Kumar, A. R. Mamidala, D. A. Faraj, B. Smith, M. Blocksome, B. Cernohous, D. Miller, J. Parker, J. Ratterman, P. Heidelberger, D. Chen, and B. Steinmacher-Burrow. Pami: A parallel active message interface for the blue gene/q supercomputer. Parallel and Distributed Processing Symposium, International, 0:763--773, 2012. Google ScholarDigital Library
P. L'Ecuyer and T. H. Andres. A random number generator based on the combination of four lcgs. Math. Comput. Simul., 44(1):99--107, 1997. Google ScholarDigital Library
N. Liu, C. Carothers, J. Cope, P. Carns, R. Ross, A. Crume, and C. Maltzahn. Modeling a leadership-scale storage system. In Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I, PPAM'11, pages 10--19, Berlin, Heidelberg, 2012. Springer-Verlag. Google ScholarDigital Library
N. Liu and C. D. Carothers. Modeling billion-node torus networks using massively parallel discrete-event simulation. In Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation, PADS '11, pages 1--8, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarDigital Library
N. Liu, J. Cope, P. Carns, C. D. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn. On the role of burst buffers in leadership-class storage systems. In In Proceedings of the 28th IEEE Conference on Mass Storage Systems and Technologies (MSST 2012). IEEE, 2012.Google ScholarCross Ref
B. D. Lubachevsky, A. Shwartz, and A. Weiss. An analysis of rollback-based simulation. ACM Transactions on Modeling and Computer Simulation, 1(2):154--193, 1991. Google ScholarDigital Library
M. Mubarak, C. D. Carothers, R. Ross, and P. Carns. Modeling a million-node dragonfly network using massively parallel discrete event simulation. In 3rd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS12) held as part of SC12, 2012. Google ScholarDigital Library
D. M. Nicol and X. Liu. The dark side of risk (what your mother never told you about time warp). In PADS '97: Proceedings of the eleventh workshop on Parallel and distributed simulation, pages 188--195, Washington, DC, USA, 1997. IEEE Computer Society. Google ScholarDigital Library
K. S. Perumalla. Scaling time warp-based discrete event execution to 104 processors on a blue gene supercomputer. In CF '07: Proceedings of the 4th international conference on Computing Frontiers, pages 69--76, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
K. S. Perumalla. μπ: A scalable and transparent system for simulation mpi programs. In In Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques, 2010. Google ScholarDigital Library
K. S. Perumalla and S. K. Seal. Reversible parallel discrete-event execution of large-scale epidemic outbreak models. In In Proceedings of the 24th Workshop on Principles of Advanced and Distributed Simulation, 2010. Google ScholarDigital Library
J. Romero. Energy-wise blog: Lack of rain a leading cause of indian grid collapse. IEEE Spectrum, July 2012.Google Scholar
P. Schweizer. Throw Them All Out. Houghton Mifflin Harcount Publishing Company, New York, 2011.Google Scholar
D. D. Sleator and R. E. Tarjan. Self-adjusting binary search trees. J. ACM, 32(3):652--686, July 1985. Google ScholarDigital Library
E. Ullman. "errant code? it's not just a bug", new york times, the opinion pages. http://www.nytimes.com/2012/08/09/opinion/after-knight-capital-new-code-for-trades.html, August 8th, 2012.Google Scholar
J. Vaucher and P. Duval. A comparison of simulation event list algorithms. Communications of the ACM, 18(4):223--230, 1975. Google ScholarDigital Library
G. Yaun, C. D. Carothers, and S. Kalyanaraman. Large-scale tcp models using optimistic parallel simulation. In Proceedings of the seventeenth workshop on Parallel and distributed simulation, PADS '03, pages 153--, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarDigital Library

Index Terms

Warp speed: executing time warp on 1,966,080 cores
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
2. Computing methodologies
  1. Modeling and simulation
    1. Simulation types and techniques
      1. Discrete-event simulation
      2. Massively parallel and high-performance simulations

Recommendations

Time Warp on the GPU: Design and Assessment
SIGSIM-PADS '17: Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

The parallel execution of discrete-event simulations on commodity GPUs has been shown to achieve high event rates. Most previous proposals have focused on conservative synchronization, which typically extracts only limited parallelism in cases of low ...
Read More
Parallel Discrete-Event Simulation on Data Processing Engines
DS-RT '16: Proceedings of the 20th International Symposium on Distributed Simulation and Real-Time Applications

Development of a decent parallel simulator is challenging work. It should achieve enough performance, scalability and fault tolerance. Our proposal is utilizing general-purpose data processing engines such as MapReduce implementations for parallel ...
Read More
Lightweight Time Warp- A Novel Protocol for Parallel Optimistic Simulation of Large-Scale DEVS and Cell-DEVS Models
DS-RT '08: Proceedings of the 2008 12th IEEE/ACM International Symposium on Distributed Simulation and Real-Time Applications

This paper proposes a novel Lightweight Time Warp (LTW) protocol for high-performance parallel optimistic simulation of large-scale DEVS and Cell-DEVS models. By exploiting the characteristics of the simulation process, the protocol is able to set free ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGSIM PADS '13: Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
May 2013
426 pages
ISBN:9781450319201
DOI:10.1145/2486092
General Chair:
Margaret L. Loper
Georgia Institute of Technology, USA
,
Program Chair:
Gabriel A. Wainer
Carleton University, Canada
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 May 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
blue gene/q
parallel discrete-event simulation
time warp
Qualifiers
- research-article
Conference

Acceptance Rates
SIGSIM PADS '13 Paper Acceptance Rate29of75submissions,39%Overall Acceptance Rate398of779submissions,51%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 98
  Total Citations
  View Citations
- 428
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Warp speed: executing time warp on 1,966,080 cores

SIGSIM PADS '13: Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Time Warp on the GPU: Design and Assessment

Parallel Discrete-Event Simulation on Data Processing Engines

Lightweight Time Warp- A Novel Protocol for Parallel Optimistic Simulation of Large-Scale DEVS and Cell-DEVS Models