skip to main content
10.1145/2486092.2486098acmconferencesArticle/Chapter ViewAbstractPublication PagespadsConference Proceedingsconference-collections
research-article

Can PDES scale in environments with heterogeneous delays?

Published:19 May 2013Publication History

ABSTRACT

The performance and scalability of Parallel Discrete Event Simulation (PDES) is often limited by communication latencies and overheads. The emergence of multi-core processors and their expected evolution into many-cores offers the promise of low latency communication and tight memory integration between cores; these properties should significantly improve the performance of PDES in such environments. However, on clusters of multi-cores (CMs), the latency and processing overheads incurred when communicating between different machines (nodes) far outweigh those between cores on the same chip, especially when commodity networking fabrics and communication software are used. It is unclear if there is any benefit to the low latency among cores on the same node given that communication links across nodes are significantly worse. In this study, we examine the performance of a multi-threaded implementation of PDES on CMs. We demonstrate that the inter-node communication costs impose a substantial bottleneck on PDES and demonstrate that without optimizations addressing these long latencies, multi-threaded PDES does not significantly outperform the multiprocess version despite direct communication through shared memory on the individual nodes. We then propose three optimizations: message consolidation and routing, infrequent polling and latency-sensitive model partitioning. We show that with these optimizations in place, threaded implementation of PDES significantly outperforms process-based implementation even on CMs.

References

  1. K. Bahulkar, J. Wang, N. Abu-Ghazaleh, and D. Ponomarev. Partitioning on dynamic behavior for parallel discrete event simulation. In Principles of Advanced and Distributed Simulation (PADS), pages 221--230. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. L. Bailey, J. V. Briner, Jr., and R. D. Chamberlain. Parallel logic simulation of VLSI systems. ACM Computing Surveys, 26(3):255--294, sep 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Bauer, C. Carothers, and A. Holder. Scalable time warp on bluegene supercomputer. In Principles of Advanced and Distributed Simulation (PADS), pages 35--44, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Boukerche and S. Das. Dynamic load balancing strategies for conservative parallel simulation. In Principles of Advanced and Distributed Simulation (PADS), pages 32--37, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Canedo, T. Yoshizawa, and H.Komatsu. Automatic parallelization of simulink applications. In Proc. of CGO, pages 151--159, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Carothers, D. Bauer, and S. Pearce. ROSS: A high-performance, low memory, modular time warp system. In Principles of Advanced and Distributed Simulation (PADS), pages 53--60. IEEE, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. D. Carothers, R. M. Fujimoto, and P. England. Effect of communication overheads on Time Warp performance: An experimental study. In Principles of Advanced and Distributed Simulation (PADS), pages 118--125, jul 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. D. Carothers, R. M. Fujimoto, and Y.-B. Lin. A case study in simulating pcs networks using time warp. In Principles of Advanced and Distributed Simulation (PADS), pages 87--94. IEEE, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Chen, J. Zhang, R. Cohen, and P.Ho. Secure and efficient trust opinion aggregation for vehicular ad-hoc networks. In Proc. of VTC, pages 1--5, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  10. L. Chen, Y. Lu, Y. Yao, S. Peng, and L. Wu. A well-balanced time warp system on multi-core environments. In Principles of Advanced and Distributed Simulation (PADS), pages 1--9. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Chetlur, N. Abu-Ghazaleh, R. Radhakrishnan, and P. A. Wilsey. Optimizing communication in Time-Warp simulators. In Principles of Advanced and Distributed Simulation (PADS), pages 64--71. IEEE, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Child and P. Wilsey. Dynamically adjusting core frequencies to accelerate time warp simulations in many-core processors. In Principles of Advanced and Distributed Simulation (PADS), pages 35--43. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Cloutier. Model partitioning and the performance of distributed timewarp simulation of logic circuits. Simulation Practice and Theory, 5(1):83--99, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. Doi and Y. Negishi. Overlapping methods of all-to-all communication and FFT algorithms for torus-connected massively parallel supercomputers. In Proc. of Int'l Conference on Supercomputing, pages 1--9, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. El-Khatib and C. Tropper. On metrics for the dynamic load balancing of optimistic simulations. In Proc. 32nd Hawaii International Conference on Systems Science (HICCS), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Fujimoto. Parallel discrete event simulation. Communications of the ACM, 33(10):30--53, oct 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Fujimoto. Performance of time warp under synthetic workloads. Proceedings of the SCS Multiconference on Distributed Simulation, 22(1):23--28, 1990.Google ScholarGoogle Scholar
  18. D. Jagtap, K. Bahulkar, D.Ponomarev, and N.Abu-Ghazaleh. Characterizing and understanding pdes behavior on tilera architecture. In Principles of Advanced and Distributed Simulation (PADS), pages 53--62. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Jagtap, N.Abu-Ghazaleh, and D.Ponomarev. Optimization of parallel discrete event simulator for multi-core systems. In Parallel and Distributed Processing Symposium (IPDPS), pages 520--531. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Karypis and V. Kumar. hmetis: a hypergraph partitioning package. Available on WWW at URL: http://www.cs.umn.edu/ karypis/metis/hmetis.Google ScholarGoogle Scholar
  21. K.Bahulkar, N.Hofmann, D.Jagtap, N.Abu-Ghazaleh, and D.Ponomarev. Performance evaluation of pdes on multicore clusters. In 14th IEEE/ACM International Symposium on Distributed Simulation and Real-Time Applications (DS-RT), pages 131--140, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K.S.Perumalla. Scaling time warp-based discrete event execution to 104 processors on a blue gene supercomputer. In in Proceedings of the ACM Computing Frontiers, pages 69--76, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Li and C. Tropper. A design-driven partitioning algorithm for distributed verilog simulation. In Principles of Advanced and Distributed Simulation (PADS), pages 211--218. IEEE, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Liu, B. chandrasekaran, J. Wu, W. Jiang, S. Kini, W. Yu, D. Buntinas, P. Wyckoff, and D. Panda. Performance comparison of mpi implementations over infiniband, myrinet and quadrics. In Proc. of ACM/IEEE conference on Supercomputing, pages 58--71. IEEE, nov 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Liu and R. Rong. Hierarchical composite synchronization. In Principles of Advanced and Distributed Simulation (PADS), pages 3--12. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Peschlow, T. Honecker, and P. Martini. A flexible dynamic partitioning algorithm for optimistic distributed simulation. In Principles of Advanced and Distributed Simulation (PADS), pages 219--228. IEEE, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Preissl, N. Wichmann, B. Long, J. Shalf, S. Ethier, and A. Koniges. Multithreaded global address space communication techniques for gyrokinetic fusion applications on ultra-scale platforms. In Proc. of Int'l Conference on Supercomputing, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. V. Sarkar and J. Hennessy. Compile-time partitioning and scheduling of parallel programs. In Proc. of the SIGPLAN Symposium on Compiler construction, pages 17--26, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. D. Sharma, N. B. Abu-Ghazaleh, U. V. Rajasekaran, and P. A. Wilsey. Optimizing message delivery in asynchronous distributed applications. In Proc. of Euro-Par, pages 1204--1208, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. D. Sharma, R. Radhakrishnan, U. V. Rajesekaran, N. B. Abu-Ghazaleh, and P. A. Wilsey. Time warp simulation on clumps. In Principles of Advanced and Distributed Simulation (PADS), pages 174--181, may 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Vitali, A. Pellegrini, and F. Quaglia. Assessing load-sharing within optimistic simulation platforms. In Proceedings of the 2012 Winter Simulation Conference. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Vitali, A. Pellegrini, and F. Quaglia. Towards symmetric multi-threaded optimistic simulation kernels. In Principles of Advanced and Distributed Simulation (PADS), pages 211--220. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Wang, D.Ponomarev, and N.Abu-Ghazaleh. Performance analysis of a multithreaded pdes simulator on multicore clusters. In Principles of Advanced and Distributed Simulation (PADS) (Short Paper), pages 93--95. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Can PDES scale in environments with heterogeneous delays?

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGSIM PADS '13: Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
        May 2013
        426 pages
        ISBN:9781450319201
        DOI:10.1145/2486092

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 May 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGSIM PADS '13 Paper Acceptance Rate29of75submissions,39%Overall Acceptance Rate398of779submissions,51%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader