Abstract
We present sam(oa)2, a software package for a dynamically adaptive, parallel solution of 2D partial differential equations on triangular grids created via newest vertex bisection. An element order imposed by the Sierpinski space-filling curve provides an algorithm for grid generation, refinement, and traversal that is inherently memory efficient. Based purely on stack and stream data structures, it completely avoids random memory access. Using an element-oriented data view suitable for local operators, concrete simulation scenarios are implemented based on control loops and event hooks, which hide the complexity of the underlying traversal scheme. Two case studies are presented: two-phase flow in heterogeneous porous media and tsunami wave propagation, demonstrated on the Tohoku tsunami 2011 in Japan.
sam(oa)2 features hybrid MPI+OpenMP parallelization based on the Sierpinski order induced on the elements. Sections defined by contiguous grid cells define atomic tasks for OpenMP work sharing and stealing, as well as for migration of grid cells between MPI processes. Using optimized communication and load balancing algorithms, sam(oa)2 achieves 88% strong scaling efficiency from 16 to 512 cores and 92% efficiency in a weak scaling test on 8,192 cores with 10 billion elements—all tests including adaptive mesh refinement and load balancing in each time step.
- M. S. Alnæs, A. Logg, K. B. Ølgaard, M. E. Rognes, and G. N. Wells. 2014. Unified form language: A domain-specific language for weak formulations of partial differential equations. ACM Trans. Math. Softw. 40, 2, Article 9 (March 2014), 37 pages. DOI:http://dx.doi.org/10.1145/2566630 Google ScholarDigital Library
- M. Bader, C. Böck, J. Schwaiger, and C. Vigh. 2010. Dynamically adaptive simulations with minimal memory requirement—Solving the shallow water equations using Sierpinski curves. SIAM J. Scientific Computing 32, 1 (2010), 212--228. DOI:http://dx.doi.org/10.1137/080728871Google ScholarDigital Library
- M. Bader, A. Breuer, W. Hölzl, and S. Rettenberger. 2014. Vectorization of an augmented Riemann solver for the shallow water equations. In Proceedings of the 2014 International Conference on High Performance Computing and Simulation (HPCS’14). IEEE, 193--201.Google Scholar
- M. Bader, K. Rahnema, and C. Vigh. 2011. Memory-efficient Sierpinski-order traversals on dynamically adaptive, recursively structured triangular grids. In Para 2010: State of the Art in Scientific and Parallel Computing, K. Jónasson (Ed.), Vol. 7134. Springer, Berlin, 302--312. DOI:http://dx.doi.org/ 10.1007/978-3-642-28145-7_30 Google ScholarDigital Library
- M. Bader, S. Schraufstetter, C. A. Vigh, and J. Behrens. 2008. Memory efficient adaptive mesh generation and implementation of multigrid algorithms using Sierpinski curves. Int. J. Comput. Sci. Eng. 4, 1 (2008), 12--21. DOI:http://dx.doi.org/10.1504/IJCSE.2008.021108 Google ScholarDigital Library
- P. Bastian, M. Blatt, A. Dedner, C. Engwer, R. Klöfkorn, M. Ohlberger, and O. Sander. 2008. A generic grid interface for parallel and adaptive scientific computing. Part I: Abstract framework. Computing 82, 2--3 (2008), 103--119. DOI:http://dx.doi.org/10.1007/s00607-008-0003-x Google ScholarDigital Library
- J. Behrens and J. Zimmermann. 2000. Parallelizing an unstructured grid generator with a space-filling curve approach. In Euro-Par 2000 Parallel Processing (Lecture Notes in Computer Science), A. Bode, T. Ludwig, W. Karl, and R. Wismüller (Eds.), Vol. 1900. Springer, Berlin, 815--823. DOI:http://dx.doi.org/10.1007/3-540-44520-X_112 Google ScholarDigital Library
- P. Binning and M. A. Celia. 1999. Practical implementation of the fractional flow approach to multi-phase flow simulation. Adv. Water Resources 22, 5 (1999), 461--478. DOI:http://dx.doi.org/ 10.1016/S0309-1708(98)00022-0Google ScholarCross Ref
- H.-J. Bungartz, M. Mehl, T. Neckel, and T. Weinzierl. 2010. The PDE framework Peano applied to fluid dynamics: An efficient implementation of a parallel multiscale fluid dynamics solver on octree-like adaptive Cartesian grids. Comput. Mechanics 46, 1 (2010), 103--114. DOI:http://dx.doi.org/ 10.1007/s00466-009-0436-xGoogle ScholarCross Ref
- C. Burstedde, D. Calhoun, K. T. Mandli, and A. R. Terrel. 2013. ForestClaw: Hybrid forest-of-octrees AMR for hyperbolic conservation laws. In Parallel Computing -- Accelerating Computational Science and Engineering (CSE’13) (Advances in Parallel Computing), Vol. 25. IOS Press, 253--262. DOI:http://dx.doi.org/ 10.3233/978-1-61499-381-0-253Google Scholar
- C. Burstedde, L. C. Wilcox, and O. Ghattas. 2011. p4est: Scalable algorithms for parallel adaptive mesh refinement on forests of octrees. SIAM J. Sci. Comput. 33, 3 (2011), 1103--1133. DOI:http://dx.doi.org/ 10.1137/100791634 Google ScholarDigital Library
- C. C. Chueh, M. Secanell, W. Bangerth, and N. Djilali. 2010. Multi-level adaptive simulation of transient two-phase flow in heterogeneous porous media. Comput. Fluids 39, 9 (2010), 1585--1596. DOI:http://dx.doi.org/ 10.1016/j.compfluid.2010.05.011Google ScholarCross Ref
- P. Galvez, J.-P. Ampuero, L. A. Dalguer, S. Somala, and T. Nissen-Meyer. 2013. Dynamic earthquake rupture modeled with an unstructured 3d spectral element method applied to the 2011 M9 Tohoku earthquake. Geophys. J. Int. 198, 2 (December 2013), 1222--1240. DOI:http://dx.doi.org/10.1093/gji/ggu203Google Scholar
- D. L. George. 2008. Augmented Riemann solvers for the shallow water equations over variable topography with steady states and inundation. J. Comput. Phys. 227, 6 (2008), 3089--3113. DOI:http://dx.doi.org/10.1016/j.jcp.2007.10.027 Google ScholarDigital Library
- J. Horrillo, Z. Kowalik, and Y. Shigihara. 2006. Wave dispersion study in the indian ocean-tsunami of December 26, 2004. Marine Geodesy 29, 3 (2006), 149--166. DOI:http://dx.doi.org/10.1080/01490410600939140Google ScholarCross Ref
- D. Kalise, I. Lie, and E. F. Toro. 2011. High-order finite volume schemes for layered atmospheric models. CoRR abs/1110.6834 (2011).Google Scholar
- R. J. LeVeque. 2002. Finite Volume Methods for Hyperbolic Problems. Cambridge University Press.Google Scholar
- R. J. LeVeque, D. L. George, and M. J. Berger. 2011. Tsunami modelling with adaptively refined finite volume methods. Acta Numer. 20 (5 2011), 211--289. DOI:http://dx.doi.org/10.1017/S0962492911000043Google Scholar
- A. Logg and G. N. Wells. 2010. DOLFIN: Automated finite element computing. ACM Trans. Math. Softw. 37, 2, Article 20 (April 2010), 28 pages. DOI:http://dx.doi.org/10.1145/1731022.1731030 Google ScholarDigital Library
- K. T. Mandli. 2013. A numerical method for the two layer shallow water equations with dry states. Ocean Modelling 72, (2013), 80--91. DOI:http://dx.doi.org/10.1016/j.ocemod.2013.08.001Google ScholarCross Ref
- J. D. McCalpin. 1995. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter (Dec. 1995), 19--25.Google Scholar
- O. Meister and M. Bader. 2015. 2D adaptivity for 3d problems: Parallel SPE10 reservoir simulation on dynamically adaptive prism grids. J. Comput. Sci. 9, (May 2015), 101--106.Google ScholarCross Ref
- O. Meister, K. Rahnema, and M. Bader. 2012. A software concept for cache-efficient simulation on dynamically adaptive structured triangular grids. In Applications, Tools and Techniques on the Road to Exascale Computing (Advances in Parallel Computing), K. De Boschhere, E. H. D’Hollander, G. R. Joubert, D. Padua, and F. Peters (Eds.), Vol. 22. ParCo 2012, IOS Press, 251--260. DOI:http://dx.doi.org/10.3233/978-1-61499-041-3-251Google Scholar
- G. Meurant. 1987. Multitasking the conjugate gradient method on the CRAY X-MP/48. Parallel Comput. 5, 3 (1987), 267--280. DOI:http://dx.doi.org/10.1016/0167-8191(87)90037-8Google ScholarCross Ref
- W. F. Mitchell. 1991. Adaptive refinement for arbitrary finite-element spaces with hierarchical bases. J. Comput. Appl. Math. 36 (1991), 65--78. Google ScholarDigital Library
- W. F. Mitchell. 2007. A refinement-tree based partitioning method for dynamic load balancing with adaptively refined grids. J. Parallel Distrib. Comput. 67, 4 (2007), 417--429. Google ScholarDigital Library
- R. Pajarola. 1998. Large scale terrain visualization using the restricted quadtree triangulation. In Proceedings of the Conference on Visualization (VIS’98). IEEE Computer Society Press, Los Alamitos, CA, USA, 19--26. Google ScholarDigital Library
- A. Pinar and C. Aykanat. 2004. Fast optimal load balancing algorithms for 1D partitioning. J. Parallel Distrib. Comput. 64, 8 (2004), 974--996. DOI:http://dx.doi.org/10.1016/j.jpdc.2004.05.003 Google ScholarDigital Library
- S. Popinet. 2011. Quadtree-adaptive tsunami modelling. Ocean Dynamics 61, 9 (2011), 1261--1285. DOI:http://dx.doi.org/10.1007/s10236-011-0438-zGoogle ScholarCross Ref
- S. Rettenberger. 2012. Ein Paralleler Server Für Adaptive Geoinformation in Strömungssimulationen. Master’s thesis. Institut für Informatik, Technische Universität München.Google Scholar
- Y. Saad. 1985. Practical use of polynomial preconditionings for the conjugate gradient method. SIAM J. Sci. Statist. Comput. 6, 4 (1985), 865--881. DOI:http://dx.doi.org/10.1137/0906059Google ScholarDigital Library
- K. Schloegel, G. Karypis, V. Kumar, R. Biswas, and L. Oliker. 1998. A performance study of diffusive vs. remapped load-balancing schemes. In ISCA 11th International Conference on Parallel and Distributed Computing Systems. International Society for Computers and Their Applications, 59--66.Google Scholar
- M. Schreiber, H.-J. Bungartz, and M. Bader. 2012. Shared memory parallelization of fully-adaptive simulations using a dynamic tree-split and -join approach. In IEEE International Conference on High Performance Computing (HiPC’12), IEEE Xplore, 1--10. DOI:http://dx.doi.org/10.1109/HiPC.2012.6507479Google Scholar
- J. R. Shewchuk. 1994. An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Technical Report. Carnegie Mellon University, Pittsburgh, PA. Google ScholarDigital Library
- L. Velho, L. de Figueiredo, and J. Gomes. 1999. Hierarchical generalized triangle strips. Vis. Comput. 15, 1 (1999), 21--35. DOI:http://dx.doi.org/10.1007/s003710050160Google ScholarCross Ref
- C. A. Vigh. 2012. Parallel Simulation of the Shallow Water Equations on Structured Dynamically Adaptive Triangular Grids. Dissertation. Technische Universitt München, München.Google Scholar
- T. Weinzierl and M. Mehl. 2011. Peano -- A traversal and storage scheme for octree-like adaptive Cartesian multiscale grids. SIAM J. Sci. Comput. 33, 5 (2011), 2732--2760. DOI:http://dx.doi.org/10.1137/100799071 Google ScholarDigital Library
- T. Weinzierl, R. Wittmann, K. Unterweger, M. Bader, A. Breuer, and S. Rettenberger. 2013. Hardware-aware block size tailoring on adaptive spacetree grids for shallow water waves. In 1st International Workshop on High-Performance Stencil Computations (HiStencils’14). HiStencils, 57--64.Google Scholar
- S. Williams, A. Waterman, and D. Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (April 2009), 65--76. DOI:http://dx.doi.org/k10.1145/ 1498765.1498785 Google ScholarDigital Library
- G. Zumbusch. 2001. On the quality of space-filling curve induced partitions. Zeitschrift für Angewandte Mathematik und Mechanik 81, Suppl. 1 (2001), 25--28. DOI:http://dx.doi.org/10.1007/3-540-47789-6_4Google Scholar
- G. Zumbusch. 2002. Load balancing for adaptively refined grids. Proc. Appl. Math. Mechanics 1 (2002), 534--537.Google ScholarCross Ref
Index Terms
- Parallel Memory-Efficient Adaptive Mesh Refinement on Structured Triangular Meshes with Billions of Grid Cells
Recommendations
p4est: Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees
We present scalable algorithms for parallel adaptive mesh refinement and coarsening (AMR), partitioning, and 2:1 balancing on computational domains composed of multiple connected two-dimensional quadtrees or three-dimensional octrees, referred to as a ...
Adaptive mesh coarsening for quadrilateral and hexahedral meshes
Mesh adaptation methods can improve the efficiency and accuracy of solutions to computational modeling problems. In many applications involving quadrilateral and hexahedral meshes, local modifications which maintain the original element type are ...
Parallel performance of h-type Adaptive Mesh Refinement for Nek5000
EASC '16: Proceedings of the Exascale Applications and Software Conference 2016We discuss parallel performance of h-type Adaptive Mesh Refinement (AMR) developed for the high-order spectral element solver Nek5000 within CRESTA project. AMR is a desired feature of the future simulation software, as it gives possibility to increase ...
Comments