skip to main content
10.1145/2486092.2486134acmconferencesArticle/Chapter ViewAbstractPublication PagespadsConference Proceedingsconference-collections
research-article

Warp speed: executing time warp on 1,966,080 cores

Published:19 May 2013Publication History

ABSTRACT

Time Warp is an optimistic synchronization protocol for parallel discrete event simulation that coordinates the available parallelism through its rollback and antimessage mechanisms. In this paper we present the results of a strong scaling study of the ROSS simulator running Time Warp with reverse computation and executing the well-known PHOLD benchmark on Lawrence Livermore National Laboratory's Sequoia Blue Gene/Q supercomputer. The benchmark has 251 million PHOLD logical processes and was executed in several configurations up to a peak of 7.86 million MPI tasks running on 1,966,080 cores. At the largest scale it processed 33 trillion events in 65 seconds, yielding a sustained speed of 504 billion events/second using 120 racks of Sequoia. This is by far the highest event rate reported by any parallel discrete event simulation to date, whether running PHOLD or any other benchmark. Additionally, we believe it is likely to be the largest number of MPI tasks ever used in any computation of any kind to date.

ROSS exhibited a super-linear speedup throughout the strong scaling study, with more than a 97x speed improvement from scaling the number of cores by only 60x (from 32,768 to 1,966,080). We attribute this to significant cache-related performance acceleration as we moved to higher scales with fewer LPs per core.

Prompted by historical performance results we propose a new, long term performance metric called Warp Speed that grows logarithmically with the PHOLD event rate. As we define it our maximum speed of 504 billion PHOLD events/sec corresponds to Warp 2.7.

We suggest that the results described here are significant because they demonstrate that direct simulation of planetary-scale discrete event models are now, in principle at least, within reach.

References

  1. D. W. Bauer and C. D. Carothers. Eliminating remote message passing in optimistic simulation. In WSC '06: Proceedings of the 38th conference on Winter simulation. Winter Simulation Conference, December 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. W. Bauer Jr., C. D. Carothers, and A. Holder. Scalable time warp on blue gene supercomputers. In Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation, pages 35--44, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Beckman, K. Iskra, K. Yoshii, S. Coghlan, and A. Nataraj. Benchmarking the Effects of Operating System Interference on Extreme-Scale Parallel Machines. Cluster Comput., 11:3--16, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. D. Carothers, D. Bauer, and S. Pearce. Ross: A high-performance, low-memory, modular time warp system. Journal of Parallel and Distributed Computing, 62(11):1648 -- 1669, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. D. Carothers and K. S. Perumalla. On deciding between conservative and optimistic approaches on massively parallel platforms. In Winter Simulation Conference'10, pages 678--687, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. D. Carothers, K. S. Perumalla, and R. M. Fujimoto. Efficient optimistic parallel simulations using reverse computation. ACM Transactions on Modeling and Computer Simulation, 9(3):224--253, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Chen, N. Eisley, P. Heidelberger, S. Kumar, A. Mamidala, F. Petrini, R. Senger, Y. Sugawara, R. Walkup, B. Steinmacher-Burow, A. Choudhury, Y. Sabharwal, S. Singhal, and J. J. Parker. Looking under the hood of the ibm blue gene/q network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 69:1--69:12, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Chen, N. A. Eisley, P. Heidelberger, R. M. Senger, Y. Sugawara, S. Kumar, V. Salapura, D. L. Satterfield, B. Steinmacher-Burow, and J. J. Parker. The ibm blue gene/q interconnection network and message unit. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 26:1--26:10, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Chen and B. K. Szymanski. Dsim: scaling time warp to 1,033 processors. In WSC '05: Proceedings of the 37th conference on Winter simulation, pages 346--355. Winter Simulation Conference, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Chen and B. K. Szymanski. Time quantum GVT: A scalable computation of the global virtual time in parallel discrete event simulations. Scalable Computing: Practice and Experience: Scientific International Journal for Parallel and Distributed Computing, pages 425--446, 2007.Google ScholarGoogle Scholar
  11. G. Chiu, P. Coteus, and R. Wisniewski. Blue gene/q overview and update. http://www.alcf.anl.gov/sites/www.alcf.anl.gov/files/IBM_BGQ_Architecture_0.pdf, 2011.Google ScholarGoogle Scholar
  12. C. C. Foster. Information retrieval: information storage and retrieval using avl trees. In Proceedings of the 1965 20th national conference, ACM '65, pages 192--205, New York, NY, USA, 1965. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. M. Fujimoto. Performance of time warp under synthetic workloads, January 1990.Google ScholarGoogle Scholar
  14. R. M. Fujimoto and K. S. Panesar. Buffer management in shared-memory time warp systems. In Proceedings of the ninth workshop on Parallel and distributed simulation, PADS '95, pages 149--156, Washington, DC, USA, 1995. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. M. Fujimoto, K. Perumalla, A. Park, H. Wu, M. H. Ammar, and G. F. Riley. Large-scale network simulation -- how big? how fast. In In Symposium on Modeling, Analysis and Simulation of Computer Telecommunication Systems (MASCOTS, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  16. E. Gonsiorowski, C. Carothers, and C. Tropper. Modeling large scale circuits using massively parallel discrete-event simulation. In Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), 2012 IEEE 20th International Symposium on, pages 127--133, Aug. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. G. Greenberg, B. D. Lubachevsky, P. E. Wright, and D. M. Nicol. Efficient massively parallel simulation of dynamic channel assignment schemes for wireless cellular communications. In Workshop on Parallel and Distributed Simulation, pages 187--194, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Hao, K. Wilson, R. Fujimoto, and E. Zegura. Logical process size in parallel simulations. In Proceedings of the 28th conference on Winter simulation, WSC '96, pages 645--652, Washington, DC, USA, 1996. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Holder and C. D. Carothers. Analysis of time warp on a 32,768 processor ibm blue gene/l supercomputer. In 2008 Proceedings European Modeling and Simulation Symposium (EMSS), 2008.Google ScholarGoogle Scholar
  20. D. R. Jefferson. Virtual time. ACM Trans. Program. Lang. Syst., 7(3):404--425, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Kumar, A. R. Mamidala, D. A. Faraj, B. Smith, M. Blocksome, B. Cernohous, D. Miller, J. Parker, J. Ratterman, P. Heidelberger, D. Chen, and B. Steinmacher-Burrow. Pami: A parallel active message interface for the blue gene/q supercomputer. Parallel and Distributed Processing Symposium, International, 0:763--773, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. L'Ecuyer and T. H. Andres. A random number generator based on the combination of four lcgs. Math. Comput. Simul., 44(1):99--107, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Liu, C. Carothers, J. Cope, P. Carns, R. Ross, A. Crume, and C. Maltzahn. Modeling a leadership-scale storage system. In Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I, PPAM'11, pages 10--19, Berlin, Heidelberg, 2012. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Liu and C. D. Carothers. Modeling billion-node torus networks using massively parallel discrete-event simulation. In Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation, PADS '11, pages 1--8, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Liu, J. Cope, P. Carns, C. D. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn. On the role of burst buffers in leadership-class storage systems. In In Proceedings of the 28th IEEE Conference on Mass Storage Systems and Technologies (MSST 2012). IEEE, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  26. B. D. Lubachevsky, A. Shwartz, and A. Weiss. An analysis of rollback-based simulation. ACM Transactions on Modeling and Computer Simulation, 1(2):154--193, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Mubarak, C. D. Carothers, R. Ross, and P. Carns. Modeling a million-node dragonfly network using massively parallel discrete event simulation. In 3rd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS12) held as part of SC12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. M. Nicol and X. Liu. The dark side of risk (what your mother never told you about time warp). In PADS '97: Proceedings of the eleventh workshop on Parallel and distributed simulation, pages 188--195, Washington, DC, USA, 1997. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. S. Perumalla. Scaling time warp-based discrete event execution to 104 processors on a blue gene supercomputer. In CF '07: Proceedings of the 4th international conference on Computing Frontiers, pages 69--76, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. S. Perumalla. μπ: A scalable and transparent system for simulation mpi programs. In In Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. S. Perumalla and S. K. Seal. Reversible parallel discrete-event execution of large-scale epidemic outbreak models. In In Proceedings of the 24th Workshop on Principles of Advanced and Distributed Simulation, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Romero. Energy-wise blog: Lack of rain a leading cause of indian grid collapse. IEEE Spectrum, July 2012.Google ScholarGoogle Scholar
  33. P. Schweizer. Throw Them All Out. Houghton Mifflin Harcount Publishing Company, New York, 2011.Google ScholarGoogle Scholar
  34. D. D. Sleator and R. E. Tarjan. Self-adjusting binary search trees. J. ACM, 32(3):652--686, July 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. E. Ullman. "errant code? it's not just a bug", new york times, the opinion pages. http://www.nytimes.com/2012/08/09/opinion/after-knight-capital-new-code-for-trades.html, August 8th, 2012.Google ScholarGoogle Scholar
  36. J. Vaucher and P. Duval. A comparison of simulation event list algorithms. Communications of the ACM, 18(4):223--230, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. G. Yaun, C. D. Carothers, and S. Kalyanaraman. Large-scale tcp models using optimistic parallel simulation. In Proceedings of the seventeenth workshop on Parallel and distributed simulation, PADS '03, pages 153--, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Warp speed: executing time warp on 1,966,080 cores

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGSIM PADS '13: Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
          May 2013
          426 pages
          ISBN:9781450319201
          DOI:10.1145/2486092

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 May 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGSIM PADS '13 Paper Acceptance Rate29of75submissions,39%Overall Acceptance Rate398of779submissions,51%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader