research-article

Open Access

Load Balancing and Patch-Based Parallel Adaptive Mesh Refinement for Tsunami Simulation on Heterogeneous Platforms Using Xeon Phi Coprocessors

Authors:
Chaulio R. Ferreira

Technical University of Munich, Department of Informatics, Garching, Germany

Technical University of Munich, Department of Informatics, Garching, Germany
View Profile

,
Michael Bader

Technical University of Munich, Department of Informatics, Garching, Germany

Technical University of Munich, Department of Informatics, Garching, Germany
View Profile

PASC '17: Proceedings of the Platform for Advanced Scientific Computing ConferenceJune 2017Article No.: 12Pages 1–12https://doi.org/10.1145/3093172.3093237

Published:26 June 2017Publication History

PASC '17: Proceedings of the Platform for Advanced Scientific Computing Conference

Pages 1–12

ABSTRACT

We present a patch-based approach for tsunami simulation with parallel adaptive mesh refinement on the Salomon supercomputer. The special architecture of Salomon, with two Intel Xeon CPUs (Haswell architecture) and two Intel Xeon Phi coprocessors (Knights Corner) per compute node, suggests truly heterogeneous load balancing instead of offload approaches, because host and accelerator achieve comparable performance for our simulations.

We use a tree-structured mesh refinement strategy resulting from newest-vertex bisection of triangular grid cells, but introduce small uniform grid patches into the leaves of the tree to allow vectorisation of the Finite Volume solver over grid cells. In particular, we implemented vectorised versions of the approximate Riemann solvers, exploiting Fortran's array notations where possible. While large patches increase computational performance due to vectorisation, improved memory access and reduced meshing overhead, they also increase the overall number of processed cells. Thus, a trade-off must be found regarding the patch size. We experimented with different patch sizes in a study of the time-to-solution of a simulation of the 2011 Tohoku tsunami, and found that relatively small patches with 82 cells resulted in the smallest execution times.

We use the Xeon Phis in symmetric mode and apply heterogeneous load balancing between hosts and coprocessors, identifying the relative load distribution either from on-the-fly runtime measurements or from a priori exhaustive testing. Both approaches perform better than homogeneous load balancing and better than using only the CPUs or only the Xeon Phi coprocessors in native mode. In all set-ups, however, the absolute speedups are impeded by the slow MPI communication between Xeon Phi coprocessors.

References

Alexey Androsov, Jörn Behrens, and Sergey Danilov. 2011. Tsunami Modelling with Unstructured Grids. Interaction between Tides and Tsunami Waves. In Computational Science and High Performance Computing IV, Vol. 115. 191--206.Google Scholar
Michael Bader, Christian Böck, Johannes Schwaiger, and Csaba Attila Vigh. 2010. Dynamically Adaptive Simulations with Minimal Memory Requirement -- Solving the Shallow Water Equations Using Sierpinski Curves. SIAM Journal of Scientific Computing 32, 1 (2010), 212--228.Google ScholarDigital Library
Derek S. Bale, Randall J. LeVeque, Sorin Mitran, and James A. Rossmanith. 2002. A wave propagation method for conservation laws and balance laws with spatially varying flux functions. SIAM Journal on Scientific Computing 24, 3 (2002), 955--978. Google ScholarDigital Library
Jörn Behrens and Jens Zimmermann. 2000. Parallelizing an Unstructured Grid Generator with a Space-Filling Curve Approach. In Euro-Par 2000 Parallel Processing (Lecture Notes in Computer Science), Vol. 1900. Springer Berlin Heidelberg, 815--823. Google ScholarDigital Library
Gheorghe-Teodor Bercea, Andrew T. T. McRae, David A. Ham, Lawrence Mitchell, Florian Rathgeber, Luigi Nardi, Fabio Luporini, and Paul H. J. Kelly. 2016. A structure-exploiting numbering algorithm for finite elements on extruded meshes, and its performance evaluation in Firedrake. Geoscientific Model Development 9, 10 (2016), 3803--3815.Google ScholarCross Ref
Marsha J. Berger and Phillip Colella. 1989. Local adaptive mesh refinement for shock hydrodynamics. Journal of Computational Physics 82 (1989), 64--84. Google ScholarDigital Library
Marsha J. Berger, David L. George, Randall J. LeVeque, and Kyle T. Mandli. 2011. The GeoClaw software for depth-averaged flows with adaptive refinement. Advances in Water Resources 34, 9 (2011), 1195--1206.Google ScholarCross Ref
Marsha J. Berger and Joseph Oliger. 1984. Adaptive mesh refinement for hyperbolic partial differential equations. Journal of Computational Physics 53 (1984), 484--512.Google ScholarCross Ref
Carsten Burstedde, Donna Calhoun, Kyle Mandli, and Andy R. Terrel. 2014. ForestClaw: Hybrid forest-of-octrees AMR for hyperbolic conservation laws. In Parallel Computing: Accelerating Computational Science and Engineering (CSE) (Advances in Parallel Computing), Vol. 25. 253--262.Google Scholar
Carsten Burstedde, Lucas C. Wilcox, and Omar Ghattas. 2011. p4est: Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees. SIAM Journal on Scientific Computing 33, 3 (2011), 1103--1133. Google ScholarDigital Library
Richard Courant, Kurt Friedrichs, and Hans Lewy. 1967. On the partial difference equations of mathematical physics. IBM journal 11, 2 (1967), 215--234. Google ScholarDigital Library
Anshu Dubey, Ann Almgren, John Bell, Martin Berzins, Steve Brandt, Greg Bryan, Phillip Colella, Daniel Graves, Michael Lijewski, Frank Löffler, Brian O'Shea, Erik Schnetter, Brian Van Straalen, and Klaus Weide. 2014. A survey of high level frameworks in block-structured adaptive mesh refinement packages. J. Parallel and Distrib. Comput. 74, 12 (2014), 3217--3227. Domain-Specific Languages and High-Level Frameworks for High-Performance Computing. Google ScholarDigital Library
Bernd Einfeldt. 1988. On Godunov-type methods for gas dynamics. SIAM J. Numer. Anal. 25, 2 (1988), 294--318. Google ScholarDigital Library
Percy Galvez, Jean-Paul Ampuero, Luis A. Dalguer, Surendra N. Somala, and Tarje Nissen-Meyer. 2014. Dynamic earthquake rupture modelled with an unstructured 3-D spectral element method applied to the 2011 M9 Tohoku earthquake. Geophysical Journal International 198, 2 (2014), 1222--1240.Google ScholarCross Ref
David L. George. 2008. Augmented Riemann solvers for the shallow water equations over variable topography with steady states and inundation. J. Comput. Phys. 227, 6 (2008), 3089--3113. Google ScholarDigital Library
Sven Harig, Chaeroni, Widodo S. Pranowo, and Jörn Behrens. 2008. Tsunami simulations on several scales. Ocean Dynamics 58, 5 (2008), 429--440.Google ScholarCross Ref
Alexander Heinecke, Roman Karlstetter, Dirk Pflüger, and Hans-Joachim Bungartz. 2015. Data Mining on Vast Datasets as a Cluster System Benchmark. Concurrency and Computation: Practice and Experience 28, 7 (2015), 2145--2165. Google ScholarDigital Library
Yuta Hirokawa, Taisuke Boku, Shunsuke A. Sato, and Kazuhiro Yabana. 2016. Electron Dynamics Simulation with Time-Dependent Density Functional Theory on Large Scale Symmetric Mode Xeon Phi Cluster. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1202--1211.Google Scholar
Alan Humphrey, Daniel Sunderland, Todd Harman, and Martin Berzins. 2016. Radiative Heat Transfer Calculation on 16384 GPUs Using a Reverse Monte Carlo Ray Tracing Approach with Adaptive Mesh Refinement. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1222--1231. http://www.sci.utah.edu/publications/Hum2016a/ipdps-pdsec16.pdfGoogle ScholarCross Ref
James Jeffers and James Reinders. 2013. Intel Xeon Phi coprocessor high-performance programming. Newnes. Google ScholarDigital Library
Randall J. LeVeque, David L. George, and Marsha J. Berger. 2011. Tsunami modelling with adaptively refined finite volume methods. Acta Numerica 20 (2011), 211--289.Google ScholarCross Ref
Kyle T. Mandli and Clint N. Dawson. 2014. Adaptive mesh refinement for storm surge. Ocean Modelling 75 (2014), 36--50.Google ScholarCross Ref
Oliver Meister. 2016. Sierpinski Curves for Parallel Adaptive Mesh Refinement in Finite Element and Finite Volume Methods. Dissertation. Institut für Informatik, Technische Universität München. https://mediatum.ub.tum.de/doc/1320149/1320149.pdfGoogle Scholar
Oliver Meister and Michael Bader. 2015. 2D adaptivity for 3D problems: Parallel SPE10 reservoir simulation on dynamically adaptive prism grids. Journal of Computational Science 9 (2015), 101--106.Google ScholarCross Ref
Oliver Meister, Kaveh Rahnema, and Michael Bader. 2016. Parallel Memory-Efficient Adaptive Mesh Refinement on Structured Triangular Meshes with Billions of Grid Cells. ACM Transactions on Mathematical Software 43, 3 (2016), 19. Google ScholarDigital Library
Qingyu Meng, Alan Humphrey, John Schmidt, and Martin Berzins. 2013. Investigating Applications Portability with the Uintah DAG-based Runtime System on PetaScale Supercomputers. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13). ACM, 96:1--96:12. Google ScholarDigital Library
Qingyu Meng, Alan Humphrey, John Schmidt, and Martin Berzins. 2013. Preliminary Experiences with the Uintah Framework on Intel Xeon Phi and Stampede. In Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery (XSEDE '13). ACM, 48:1--48:8. Google ScholarDigital Library
William F. Mitchell. 1991. Adaptive refinement for arbitrary finite-element spaces with hierarchical bases. Journal of computational and applied mathematics 36, 1 (1991), 65--78. Google ScholarDigital Library
William F. Mitchell. 2007. A Refinement-Tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids. J. Parallel and Distrib. Comput. 67, 4 (2007), 417--429. Google ScholarDigital Library
Andreas Mueller, Michal Kopera, Simone Marras, Lucas Wilcox, Tobin Isaac, and Francis X. Giraldo. 2016. Strong scaling for numerical weather prediction at petascale with the atmospheric model NUMA. International Journal for High-Performance Computing Applications (2016).Google Scholar
Ali Pinar and Cevdet Aykanat. 2004. Fast optimal load balancing algorithms for 1D partitioning. J. Parallel Distrib. Comput. 64, 8 (2004), 974--996. Google ScholarDigital Library
Ali Pinar, E. Kartal Tabak, and Cevdet Aykanat. 2008. One-dimensional partitioning for heterogeneous systems: Theory and practice. J. Parallel and Distrib. Comput. 68, 11 (2008), 1473--1486. Google ScholarDigital Library
Stephane Popinet. 2012. Adaptive modelling of long-distance wave propagation and fine-scale flooding during the Tohoku tsunami. Natural Hazards and Earth System Sciences 12 (2012), 1213--1227.Google ScholarCross Ref
Sreeram Potluri, Devendar Bureddy, Khaled Hamidouche, Akshay Venkatesh, Krishna Kandalla, Hari Subramoni, and Dhabaleswar K. (Dk) Panda. 2013. MVAPICH-PRISM: A Proxy-based Communication Framework Using InfiniBand and SCIF for Intel MIC Clusters. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13). ACM, Article 54, 11 pages. Google ScholarDigital Library
Abtin Rahimian, Ilya Lashuk, Shravan Veerapaneni, Aparna Chandramowlishwaran, Dhairya Malhotra, Logan Moon, Rahul Sampath, Aashay Shringarpure, Jeffrey Vetter, Richard Vuduc, Denis Zorin, and George Biros. 2010. Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures. In Supercomputing 2010. 1--11. Google ScholarDigital Library
Sebastian Rettenberger, Oliver Meister, Michael Bader, and Alice-Agnes Gabriel. 2016. ASAGI -- A Parallel Server for Adaptive Geoinformation. In Proceedings of the Exascale Applications and Software Conference 2016 (EASC '16). ACM, 2:1--2:9. http://delivery.acm.org/10.1145/2940000/2938618/a2-Rettenberger.pdf Google ScholarDigital Library
Martin Schreiber and Hans-Joachim Bungartz. 2014. Cluster-based communication and load balancing for simulations on dynamically adaptive grids. In Proceedings of the International Conference on Computational Science (ICCS'14) (Procedia Computer Science), Vol. 29. Elsevier, 2241--2253.Google ScholarCross Ref
Jie Shen, Ana Lucia Varbanescu, Yutong Lu, Peng Zou, and Henk Sips. 2016. Workload Partitioning for Accelerating Applications on Heterogeneous Platforms. IEEE Transactions on Parallel and Distributed Systems 27, 9 (2016), 2766--2780. Google ScholarDigital Library
Hari Sundar and Omar Ghattas. 2015. A Nested Partitioning Algorithm for Adaptive Meshes on Heterogeneous Clusters. In Proceedings of the 29th ACM on International Conference on Supercomputing. ACM, 319--328. Google ScholarDigital Library
Kristof Unterweger, Roland Wittmann, Philipp Neumann, Tobias Weinzierl, and Hans-Joachim Bungartz. 2015. Integration of FULLSWOF2D and PeanoClaw: Adaptivity and Local Time-stepping for Complex Overland Flows. In Recent Trends in Computational Engineering -- CE2014 (Lecture Notes in Computational Science and Engineering), Vol. 105. Springer, 181--195.Google Scholar
Karthikeyan Vaidyanathan, Kiran Pamnany, Dhiraj D. Kalamkar, Alexander Heinecke, Mikhail Smelyanskiy, Jongsoo Park, Daehyun Kim, Aniruddha Shet G., Bharat Kaul, B'alint Jo'o, and Pradeep Dubey. 2014. Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters. In 28th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2014, Phoenix, AZ, USA, May 19-23, 2014. Google ScholarDigital Library
Mohamed Wahib, Naoya Maruyama, and Takayuki Aoki. 2016. Daino: A High-level Framework for Parallel and Efficient AMR on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16). IEEE Press, 53:1--53:12. http://dl.acm.org/citation.cfm?id=3014904.3014975 Google ScholarDigital Library
Tobias Weinzierl, Michael Bader, Kristof Unterweger, and Roland Wittmann. 2014. Block Fusion on Dynamically Adaptive Spacetree Grids for Shallow Water Waves. Parallel Processing Letters 24, 3 (2014), 1441006.Google ScholarCross Ref

Index Terms

Load Balancing and Patch-Based Parallel Adaptive Mesh Refinement for Tsunami Simulation on Heterogeneous Platforms Using Xeon Phi Coprocessors

Recommendations

Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors
IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

Intel® Xeon Phi™ coprocessor is based on the Intel® Many Integrated Core (Intel® MIC) architecture, which is an innovative new processor architecture that combines abundant thread parallelism with long SIMD vector units. Efficiently exploiting SIMD ...
Read More
Effective SIMD vectorization for intel Xeon Phi coprocessors
Special issue on Programming Models, Languages, and Compilers for Manycore and Heterogeneous Architectures

Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel Xeon Phi coprocessors. In this paper, we present several effective SIMD vectorization techniques such as ...
Read More
Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

PASC '17: Proceedings of the Platform for Advanced Scientific Computing Conference
June 2017
136 pages
ISBN:9781450350624
DOI:10.1145/3093172

Copyright © 2017 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 June 2017
Check for updates
Author Tags
Parallel adaptive mesh refinement
Xeon Phi coprocessor
load balancing on heterogeneous systems
patch-based adaptivity
tsunami simulation
vectorisation
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
PASC '17 Paper Acceptance Rate13of33submissions,39%Overall Acceptance Rate109of221submissions,49%
More
Upcoming Conference
PASC '24

Sponsor:

sighpc

Platform for Advanced Scientific Computing Conference

June 3 - 5, 2024

Zurich , Switzerland
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 378
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Load Balancing and Patch-Based Parallel Adaptive Mesh Refinement for Tsunami Simulation on Heterogeneous Platforms Using Xeon Phi Coprocessors

PASC '17: Proceedings of the Platform for Advanced Scientific Computing Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors

Effective SIMD vectorization for intel Xeon Phi coprocessors

Evaluation of Rodinia Codes on Intel Xeon Phi

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Load Balancing and Patch-Based Parallel Adaptive Mesh Refinement for Tsunami Simulation on Heterogeneous Platforms Using Xeon Phi Coprocessors

PASC '17: Proceedings of the Platform for Advanced Scientific Computing Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors

Effective SIMD vectorization for intel Xeon Phi coprocessors

Evaluation of Rodinia Codes on Intel Xeon Phi

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors