skip to main content
10.1145/3093172.3093237acmconferencesArticle/Chapter ViewAbstractPublication PagespascConference Proceedingsconference-collections
research-article
Open Access

Load Balancing and Patch-Based Parallel Adaptive Mesh Refinement for Tsunami Simulation on Heterogeneous Platforms Using Xeon Phi Coprocessors

Authors Info & Claims
Published:26 June 2017Publication History

ABSTRACT

We present a patch-based approach for tsunami simulation with parallel adaptive mesh refinement on the Salomon supercomputer. The special architecture of Salomon, with two Intel Xeon CPUs (Haswell architecture) and two Intel Xeon Phi coprocessors (Knights Corner) per compute node, suggests truly heterogeneous load balancing instead of offload approaches, because host and accelerator achieve comparable performance for our simulations.

We use a tree-structured mesh refinement strategy resulting from newest-vertex bisection of triangular grid cells, but introduce small uniform grid patches into the leaves of the tree to allow vectorisation of the Finite Volume solver over grid cells. In particular, we implemented vectorised versions of the approximate Riemann solvers, exploiting Fortran's array notations where possible. While large patches increase computational performance due to vectorisation, improved memory access and reduced meshing overhead, they also increase the overall number of processed cells. Thus, a trade-off must be found regarding the patch size. We experimented with different patch sizes in a study of the time-to-solution of a simulation of the 2011 Tohoku tsunami, and found that relatively small patches with 82 cells resulted in the smallest execution times.

We use the Xeon Phis in symmetric mode and apply heterogeneous load balancing between hosts and coprocessors, identifying the relative load distribution either from on-the-fly runtime measurements or from a priori exhaustive testing. Both approaches perform better than homogeneous load balancing and better than using only the CPUs or only the Xeon Phi coprocessors in native mode. In all set-ups, however, the absolute speedups are impeded by the slow MPI communication between Xeon Phi coprocessors.

References

  1. Alexey Androsov, Jörn Behrens, and Sergey Danilov. 2011. Tsunami Modelling with Unstructured Grids. Interaction between Tides and Tsunami Waves. In Computational Science and High Performance Computing IV, Vol. 115. 191--206.Google ScholarGoogle Scholar
  2. Michael Bader, Christian Böck, Johannes Schwaiger, and Csaba Attila Vigh. 2010. Dynamically Adaptive Simulations with Minimal Memory Requirement -- Solving the Shallow Water Equations Using Sierpinski Curves. SIAM Journal of Scientific Computing 32, 1 (2010), 212--228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Derek S. Bale, Randall J. LeVeque, Sorin Mitran, and James A. Rossmanith. 2002. A wave propagation method for conservation laws and balance laws with spatially varying flux functions. SIAM Journal on Scientific Computing 24, 3 (2002), 955--978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jörn Behrens and Jens Zimmermann. 2000. Parallelizing an Unstructured Grid Generator with a Space-Filling Curve Approach. In Euro-Par 2000 Parallel Processing (Lecture Notes in Computer Science), Vol. 1900. Springer Berlin Heidelberg, 815--823. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gheorghe-Teodor Bercea, Andrew T. T. McRae, David A. Ham, Lawrence Mitchell, Florian Rathgeber, Luigi Nardi, Fabio Luporini, and Paul H. J. Kelly. 2016. A structure-exploiting numbering algorithm for finite elements on extruded meshes, and its performance evaluation in Firedrake. Geoscientific Model Development 9, 10 (2016), 3803--3815.Google ScholarGoogle ScholarCross RefCross Ref
  6. Marsha J. Berger and Phillip Colella. 1989. Local adaptive mesh refinement for shock hydrodynamics. Journal of Computational Physics 82 (1989), 64--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Marsha J. Berger, David L. George, Randall J. LeVeque, and Kyle T. Mandli. 2011. The GeoClaw software for depth-averaged flows with adaptive refinement. Advances in Water Resources 34, 9 (2011), 1195--1206.Google ScholarGoogle ScholarCross RefCross Ref
  8. Marsha J. Berger and Joseph Oliger. 1984. Adaptive mesh refinement for hyperbolic partial differential equations. Journal of Computational Physics 53 (1984), 484--512.Google ScholarGoogle ScholarCross RefCross Ref
  9. Carsten Burstedde, Donna Calhoun, Kyle Mandli, and Andy R. Terrel. 2014. ForestClaw: Hybrid forest-of-octrees AMR for hyperbolic conservation laws. In Parallel Computing: Accelerating Computational Science and Engineering (CSE) (Advances in Parallel Computing), Vol. 25. 253--262.Google ScholarGoogle Scholar
  10. Carsten Burstedde, Lucas C. Wilcox, and Omar Ghattas. 2011. p4est: Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees. SIAM Journal on Scientific Computing 33, 3 (2011), 1103--1133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Richard Courant, Kurt Friedrichs, and Hans Lewy. 1967. On the partial difference equations of mathematical physics. IBM journal 11, 2 (1967), 215--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Anshu Dubey, Ann Almgren, John Bell, Martin Berzins, Steve Brandt, Greg Bryan, Phillip Colella, Daniel Graves, Michael Lijewski, Frank Löffler, Brian O'Shea, Erik Schnetter, Brian Van Straalen, and Klaus Weide. 2014. A survey of high level frameworks in block-structured adaptive mesh refinement packages. J. Parallel and Distrib. Comput. 74, 12 (2014), 3217--3227. Domain-Specific Languages and High-Level Frameworks for High-Performance Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Bernd Einfeldt. 1988. On Godunov-type methods for gas dynamics. SIAM J. Numer. Anal. 25, 2 (1988), 294--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Percy Galvez, Jean-Paul Ampuero, Luis A. Dalguer, Surendra N. Somala, and Tarje Nissen-Meyer. 2014. Dynamic earthquake rupture modelled with an unstructured 3-D spectral element method applied to the 2011 M9 Tohoku earthquake. Geophysical Journal International 198, 2 (2014), 1222--1240.Google ScholarGoogle ScholarCross RefCross Ref
  15. David L. George. 2008. Augmented Riemann solvers for the shallow water equations over variable topography with steady states and inundation. J. Comput. Phys. 227, 6 (2008), 3089--3113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sven Harig, Chaeroni, Widodo S. Pranowo, and Jörn Behrens. 2008. Tsunami simulations on several scales. Ocean Dynamics 58, 5 (2008), 429--440.Google ScholarGoogle ScholarCross RefCross Ref
  17. Alexander Heinecke, Roman Karlstetter, Dirk Pflüger, and Hans-Joachim Bungartz. 2015. Data Mining on Vast Datasets as a Cluster System Benchmark. Concurrency and Computation: Practice and Experience 28, 7 (2015), 2145--2165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yuta Hirokawa, Taisuke Boku, Shunsuke A. Sato, and Kazuhiro Yabana. 2016. Electron Dynamics Simulation with Time-Dependent Density Functional Theory on Large Scale Symmetric Mode Xeon Phi Cluster. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1202--1211.Google ScholarGoogle Scholar
  19. Alan Humphrey, Daniel Sunderland, Todd Harman, and Martin Berzins. 2016. Radiative Heat Transfer Calculation on 16384 GPUs Using a Reverse Monte Carlo Ray Tracing Approach with Adaptive Mesh Refinement. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1222--1231. http://www.sci.utah.edu/publications/Hum2016a/ipdps-pdsec16.pdfGoogle ScholarGoogle ScholarCross RefCross Ref
  20. James Jeffers and James Reinders. 2013. Intel Xeon Phi coprocessor high-performance programming. Newnes. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Randall J. LeVeque, David L. George, and Marsha J. Berger. 2011. Tsunami modelling with adaptively refined finite volume methods. Acta Numerica 20 (2011), 211--289.Google ScholarGoogle ScholarCross RefCross Ref
  22. Kyle T. Mandli and Clint N. Dawson. 2014. Adaptive mesh refinement for storm surge. Ocean Modelling 75 (2014), 36--50.Google ScholarGoogle ScholarCross RefCross Ref
  23. Oliver Meister. 2016. Sierpinski Curves for Parallel Adaptive Mesh Refinement in Finite Element and Finite Volume Methods. Dissertation. Institut für Informatik, Technische Universität München. https://mediatum.ub.tum.de/doc/1320149/1320149.pdfGoogle ScholarGoogle Scholar
  24. Oliver Meister and Michael Bader. 2015. 2D adaptivity for 3D problems: Parallel SPE10 reservoir simulation on dynamically adaptive prism grids. Journal of Computational Science 9 (2015), 101--106.Google ScholarGoogle ScholarCross RefCross Ref
  25. Oliver Meister, Kaveh Rahnema, and Michael Bader. 2016. Parallel Memory-Efficient Adaptive Mesh Refinement on Structured Triangular Meshes with Billions of Grid Cells. ACM Transactions on Mathematical Software 43, 3 (2016), 19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Qingyu Meng, Alan Humphrey, John Schmidt, and Martin Berzins. 2013. Investigating Applications Portability with the Uintah DAG-based Runtime System on PetaScale Supercomputers. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13). ACM, 96:1--96:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Qingyu Meng, Alan Humphrey, John Schmidt, and Martin Berzins. 2013. Preliminary Experiences with the Uintah Framework on Intel Xeon Phi and Stampede. In Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery (XSEDE '13). ACM, 48:1--48:8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. William F. Mitchell. 1991. Adaptive refinement for arbitrary finite-element spaces with hierarchical bases. Journal of computational and applied mathematics 36, 1 (1991), 65--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. William F. Mitchell. 2007. A Refinement-Tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids. J. Parallel and Distrib. Comput. 67, 4 (2007), 417--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Andreas Mueller, Michal Kopera, Simone Marras, Lucas Wilcox, Tobin Isaac, and Francis X. Giraldo. 2016. Strong scaling for numerical weather prediction at petascale with the atmospheric model NUMA. International Journal for High-Performance Computing Applications (2016).Google ScholarGoogle Scholar
  31. Ali Pinar and Cevdet Aykanat. 2004. Fast optimal load balancing algorithms for 1D partitioning. J. Parallel Distrib. Comput. 64, 8 (2004), 974--996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ali Pinar, E. Kartal Tabak, and Cevdet Aykanat. 2008. One-dimensional partitioning for heterogeneous systems: Theory and practice. J. Parallel and Distrib. Comput. 68, 11 (2008), 1473--1486. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Stephane Popinet. 2012. Adaptive modelling of long-distance wave propagation and fine-scale flooding during the Tohoku tsunami. Natural Hazards and Earth System Sciences 12 (2012), 1213--1227.Google ScholarGoogle ScholarCross RefCross Ref
  34. Sreeram Potluri, Devendar Bureddy, Khaled Hamidouche, Akshay Venkatesh, Krishna Kandalla, Hari Subramoni, and Dhabaleswar K. (Dk) Panda. 2013. MVAPICH-PRISM: A Proxy-based Communication Framework Using InfiniBand and SCIF for Intel MIC Clusters. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13). ACM, Article 54, 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Abtin Rahimian, Ilya Lashuk, Shravan Veerapaneni, Aparna Chandramowlishwaran, Dhairya Malhotra, Logan Moon, Rahul Sampath, Aashay Shringarpure, Jeffrey Vetter, Richard Vuduc, Denis Zorin, and George Biros. 2010. Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures. In Supercomputing 2010. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sebastian Rettenberger, Oliver Meister, Michael Bader, and Alice-Agnes Gabriel. 2016. ASAGI -- A Parallel Server for Adaptive Geoinformation. In Proceedings of the Exascale Applications and Software Conference 2016 (EASC '16). ACM, 2:1--2:9. http://delivery.acm.org/10.1145/2940000/2938618/a2-Rettenberger.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Martin Schreiber and Hans-Joachim Bungartz. 2014. Cluster-based communication and load balancing for simulations on dynamically adaptive grids. In Proceedings of the International Conference on Computational Science (ICCS'14) (Procedia Computer Science), Vol. 29. Elsevier, 2241--2253.Google ScholarGoogle ScholarCross RefCross Ref
  38. Jie Shen, Ana Lucia Varbanescu, Yutong Lu, Peng Zou, and Henk Sips. 2016. Workload Partitioning for Accelerating Applications on Heterogeneous Platforms. IEEE Transactions on Parallel and Distributed Systems 27, 9 (2016), 2766--2780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Hari Sundar and Omar Ghattas. 2015. A Nested Partitioning Algorithm for Adaptive Meshes on Heterogeneous Clusters. In Proceedings of the 29th ACM on International Conference on Supercomputing. ACM, 319--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Kristof Unterweger, Roland Wittmann, Philipp Neumann, Tobias Weinzierl, and Hans-Joachim Bungartz. 2015. Integration of FULLSWOF2D and PeanoClaw: Adaptivity and Local Time-stepping for Complex Overland Flows. In Recent Trends in Computational Engineering -- CE2014 (Lecture Notes in Computational Science and Engineering), Vol. 105. Springer, 181--195.Google ScholarGoogle Scholar
  41. Karthikeyan Vaidyanathan, Kiran Pamnany, Dhiraj D. Kalamkar, Alexander Heinecke, Mikhail Smelyanskiy, Jongsoo Park, Daehyun Kim, Aniruddha Shet G., Bharat Kaul, B'alint Jo'o, and Pradeep Dubey. 2014. Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters. In 28th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2014, Phoenix, AZ, USA, May 19-23, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Mohamed Wahib, Naoya Maruyama, and Takayuki Aoki. 2016. Daino: A High-level Framework for Parallel and Efficient AMR on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16). IEEE Press, 53:1--53:12. http://dl.acm.org/citation.cfm?id=3014904.3014975 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Tobias Weinzierl, Michael Bader, Kristof Unterweger, and Roland Wittmann. 2014. Block Fusion on Dynamically Adaptive Spacetree Grids for Shallow Water Waves. Parallel Processing Letters 24, 3 (2014), 1441006.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Load Balancing and Patch-Based Parallel Adaptive Mesh Refinement for Tsunami Simulation on Heterogeneous Platforms Using Xeon Phi Coprocessors

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              PASC '17: Proceedings of the Platform for Advanced Scientific Computing Conference
              June 2017
              136 pages
              ISBN:9781450350624
              DOI:10.1145/3093172

              Copyright © 2017 Owner/Author

              Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 26 June 2017

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited

              Acceptance Rates

              PASC '17 Paper Acceptance Rate13of33submissions,39%Overall Acceptance Rate83of185submissions,45%

              Upcoming Conference

              PASC '24
              Platform for Advanced Scientific Computing Conference
              June 3 - 5, 2024
              Zurich , Switzerland

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader