skip to main content
research-article
Open Access

Parallel Memory-Efficient Adaptive Mesh Refinement on Structured Triangular Meshes with Billions of Grid Cells

Authors Info & Claims
Published:15 September 2016Publication History
Skip Abstract Section

Abstract

We present sam(oa)2, a software package for a dynamically adaptive, parallel solution of 2D partial differential equations on triangular grids created via newest vertex bisection. An element order imposed by the Sierpinski space-filling curve provides an algorithm for grid generation, refinement, and traversal that is inherently memory efficient. Based purely on stack and stream data structures, it completely avoids random memory access. Using an element-oriented data view suitable for local operators, concrete simulation scenarios are implemented based on control loops and event hooks, which hide the complexity of the underlying traversal scheme. Two case studies are presented: two-phase flow in heterogeneous porous media and tsunami wave propagation, demonstrated on the Tohoku tsunami 2011 in Japan.

sam(oa)2 features hybrid MPI+OpenMP parallelization based on the Sierpinski order induced on the elements. Sections defined by contiguous grid cells define atomic tasks for OpenMP work sharing and stealing, as well as for migration of grid cells between MPI processes. Using optimized communication and load balancing algorithms, sam(oa)2 achieves 88% strong scaling efficiency from 16 to 512 cores and 92% efficiency in a weak scaling test on 8,192 cores with 10 billion elements—all tests including adaptive mesh refinement and load balancing in each time step.

References

  1. M. S. Alnæs, A. Logg, K. B. Ølgaard, M. E. Rognes, and G. N. Wells. 2014. Unified form language: A domain-specific language for weak formulations of partial differential equations. ACM Trans. Math. Softw. 40, 2, Article 9 (March 2014), 37 pages. DOI:http://dx.doi.org/10.1145/2566630 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Bader, C. Böck, J. Schwaiger, and C. Vigh. 2010. Dynamically adaptive simulations with minimal memory requirement—Solving the shallow water equations using Sierpinski curves. SIAM J. Scientific Computing 32, 1 (2010), 212--228. DOI:http://dx.doi.org/10.1137/080728871Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Bader, A. Breuer, W. Hölzl, and S. Rettenberger. 2014. Vectorization of an augmented Riemann solver for the shallow water equations. In Proceedings of the 2014 International Conference on High Performance Computing and Simulation (HPCS’14). IEEE, 193--201.Google ScholarGoogle Scholar
  4. M. Bader, K. Rahnema, and C. Vigh. 2011. Memory-efficient Sierpinski-order traversals on dynamically adaptive, recursively structured triangular grids. In Para 2010: State of the Art in Scientific and Parallel Computing, K. Jónasson (Ed.), Vol. 7134. Springer, Berlin, 302--312. DOI:http://dx.doi.org/ 10.1007/978-3-642-28145-7_30 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Bader, S. Schraufstetter, C. A. Vigh, and J. Behrens. 2008. Memory efficient adaptive mesh generation and implementation of multigrid algorithms using Sierpinski curves. Int. J. Comput. Sci. Eng. 4, 1 (2008), 12--21. DOI:http://dx.doi.org/10.1504/IJCSE.2008.021108 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Bastian, M. Blatt, A. Dedner, C. Engwer, R. Klöfkorn, M. Ohlberger, and O. Sander. 2008. A generic grid interface for parallel and adaptive scientific computing. Part I: Abstract framework. Computing 82, 2--3 (2008), 103--119. DOI:http://dx.doi.org/10.1007/s00607-008-0003-x Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Behrens and J. Zimmermann. 2000. Parallelizing an unstructured grid generator with a space-filling curve approach. In Euro-Par 2000 Parallel Processing (Lecture Notes in Computer Science), A. Bode, T. Ludwig, W. Karl, and R. Wismüller (Eds.), Vol. 1900. Springer, Berlin, 815--823. DOI:http://dx.doi.org/10.1007/3-540-44520-X_112 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Binning and M. A. Celia. 1999. Practical implementation of the fractional flow approach to multi-phase flow simulation. Adv. Water Resources 22, 5 (1999), 461--478. DOI:http://dx.doi.org/ 10.1016/S0309-1708(98)00022-0Google ScholarGoogle ScholarCross RefCross Ref
  9. H.-J. Bungartz, M. Mehl, T. Neckel, and T. Weinzierl. 2010. The PDE framework Peano applied to fluid dynamics: An efficient implementation of a parallel multiscale fluid dynamics solver on octree-like adaptive Cartesian grids. Comput. Mechanics 46, 1 (2010), 103--114. DOI:http://dx.doi.org/ 10.1007/s00466-009-0436-xGoogle ScholarGoogle ScholarCross RefCross Ref
  10. C. Burstedde, D. Calhoun, K. T. Mandli, and A. R. Terrel. 2013. ForestClaw: Hybrid forest-of-octrees AMR for hyperbolic conservation laws. In Parallel Computing -- Accelerating Computational Science and Engineering (CSE’13) (Advances in Parallel Computing), Vol. 25. IOS Press, 253--262. DOI:http://dx.doi.org/ 10.3233/978-1-61499-381-0-253Google ScholarGoogle Scholar
  11. C. Burstedde, L. C. Wilcox, and O. Ghattas. 2011. p4est: Scalable algorithms for parallel adaptive mesh refinement on forests of octrees. SIAM J. Sci. Comput. 33, 3 (2011), 1103--1133. DOI:http://dx.doi.org/ 10.1137/100791634 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. C. Chueh, M. Secanell, W. Bangerth, and N. Djilali. 2010. Multi-level adaptive simulation of transient two-phase flow in heterogeneous porous media. Comput. Fluids 39, 9 (2010), 1585--1596. DOI:http://dx.doi.org/ 10.1016/j.compfluid.2010.05.011Google ScholarGoogle ScholarCross RefCross Ref
  13. P. Galvez, J.-P. Ampuero, L. A. Dalguer, S. Somala, and T. Nissen-Meyer. 2013. Dynamic earthquake rupture modeled with an unstructured 3d spectral element method applied to the 2011 M9 Tohoku earthquake. Geophys. J. Int. 198, 2 (December 2013), 1222--1240. DOI:http://dx.doi.org/10.1093/gji/ggu203Google ScholarGoogle Scholar
  14. D. L. George. 2008. Augmented Riemann solvers for the shallow water equations over variable topography with steady states and inundation. J. Comput. Phys. 227, 6 (2008), 3089--3113. DOI:http://dx.doi.org/10.1016/j.jcp.2007.10.027 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Horrillo, Z. Kowalik, and Y. Shigihara. 2006. Wave dispersion study in the indian ocean-tsunami of December 26, 2004. Marine Geodesy 29, 3 (2006), 149--166. DOI:http://dx.doi.org/10.1080/01490410600939140Google ScholarGoogle ScholarCross RefCross Ref
  16. D. Kalise, I. Lie, and E. F. Toro. 2011. High-order finite volume schemes for layered atmospheric models. CoRR abs/1110.6834 (2011).Google ScholarGoogle Scholar
  17. R. J. LeVeque. 2002. Finite Volume Methods for Hyperbolic Problems. Cambridge University Press.Google ScholarGoogle Scholar
  18. R. J. LeVeque, D. L. George, and M. J. Berger. 2011. Tsunami modelling with adaptively refined finite volume methods. Acta Numer. 20 (5 2011), 211--289. DOI:http://dx.doi.org/10.1017/S0962492911000043Google ScholarGoogle Scholar
  19. A. Logg and G. N. Wells. 2010. DOLFIN: Automated finite element computing. ACM Trans. Math. Softw. 37, 2, Article 20 (April 2010), 28 pages. DOI:http://dx.doi.org/10.1145/1731022.1731030 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. T. Mandli. 2013. A numerical method for the two layer shallow water equations with dry states. Ocean Modelling 72, (2013), 80--91. DOI:http://dx.doi.org/10.1016/j.ocemod.2013.08.001Google ScholarGoogle ScholarCross RefCross Ref
  21. J. D. McCalpin. 1995. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter (Dec. 1995), 19--25.Google ScholarGoogle Scholar
  22. O. Meister and M. Bader. 2015. 2D adaptivity for 3d problems: Parallel SPE10 reservoir simulation on dynamically adaptive prism grids. J. Comput. Sci. 9, (May 2015), 101--106.Google ScholarGoogle ScholarCross RefCross Ref
  23. O. Meister, K. Rahnema, and M. Bader. 2012. A software concept for cache-efficient simulation on dynamically adaptive structured triangular grids. In Applications, Tools and Techniques on the Road to Exascale Computing (Advances in Parallel Computing), K. De Boschhere, E. H. D’Hollander, G. R. Joubert, D. Padua, and F. Peters (Eds.), Vol. 22. ParCo 2012, IOS Press, 251--260. DOI:http://dx.doi.org/10.3233/978-1-61499-041-3-251Google ScholarGoogle Scholar
  24. G. Meurant. 1987. Multitasking the conjugate gradient method on the CRAY X-MP/48. Parallel Comput. 5, 3 (1987), 267--280. DOI:http://dx.doi.org/10.1016/0167-8191(87)90037-8Google ScholarGoogle ScholarCross RefCross Ref
  25. W. F. Mitchell. 1991. Adaptive refinement for arbitrary finite-element spaces with hierarchical bases. J. Comput. Appl. Math. 36 (1991), 65--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. W. F. Mitchell. 2007. A refinement-tree based partitioning method for dynamic load balancing with adaptively refined grids. J. Parallel Distrib. Comput. 67, 4 (2007), 417--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Pajarola. 1998. Large scale terrain visualization using the restricted quadtree triangulation. In Proceedings of the Conference on Visualization (VIS’98). IEEE Computer Society Press, Los Alamitos, CA, USA, 19--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Pinar and C. Aykanat. 2004. Fast optimal load balancing algorithms for 1D partitioning. J. Parallel Distrib. Comput. 64, 8 (2004), 974--996. DOI:http://dx.doi.org/10.1016/j.jpdc.2004.05.003 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Popinet. 2011. Quadtree-adaptive tsunami modelling. Ocean Dynamics 61, 9 (2011), 1261--1285. DOI:http://dx.doi.org/10.1007/s10236-011-0438-zGoogle ScholarGoogle ScholarCross RefCross Ref
  30. S. Rettenberger. 2012. Ein Paralleler Server Für Adaptive Geoinformation in Strömungssimulationen. Master’s thesis. Institut für Informatik, Technische Universität München.Google ScholarGoogle Scholar
  31. Y. Saad. 1985. Practical use of polynomial preconditionings for the conjugate gradient method. SIAM J. Sci. Statist. Comput. 6, 4 (1985), 865--881. DOI:http://dx.doi.org/10.1137/0906059Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Schloegel, G. Karypis, V. Kumar, R. Biswas, and L. Oliker. 1998. A performance study of diffusive vs. remapped load-balancing schemes. In ISCA 11th International Conference on Parallel and Distributed Computing Systems. International Society for Computers and Their Applications, 59--66.Google ScholarGoogle Scholar
  33. M. Schreiber, H.-J. Bungartz, and M. Bader. 2012. Shared memory parallelization of fully-adaptive simulations using a dynamic tree-split and -join approach. In IEEE International Conference on High Performance Computing (HiPC’12), IEEE Xplore, 1--10. DOI:http://dx.doi.org/10.1109/HiPC.2012.6507479Google ScholarGoogle Scholar
  34. J. R. Shewchuk. 1994. An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Technical Report. Carnegie Mellon University, Pittsburgh, PA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. Velho, L. de Figueiredo, and J. Gomes. 1999. Hierarchical generalized triangle strips. Vis. Comput. 15, 1 (1999), 21--35. DOI:http://dx.doi.org/10.1007/s003710050160Google ScholarGoogle ScholarCross RefCross Ref
  36. C. A. Vigh. 2012. Parallel Simulation of the Shallow Water Equations on Structured Dynamically Adaptive Triangular Grids. Dissertation. Technische Universitt München, München.Google ScholarGoogle Scholar
  37. T. Weinzierl and M. Mehl. 2011. Peano -- A traversal and storage scheme for octree-like adaptive Cartesian multiscale grids. SIAM J. Sci. Comput. 33, 5 (2011), 2732--2760. DOI:http://dx.doi.org/10.1137/100799071 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T. Weinzierl, R. Wittmann, K. Unterweger, M. Bader, A. Breuer, and S. Rettenberger. 2013. Hardware-aware block size tailoring on adaptive spacetree grids for shallow water waves. In 1st International Workshop on High-Performance Stencil Computations (HiStencils’14). HiStencils, 57--64.Google ScholarGoogle Scholar
  39. S. Williams, A. Waterman, and D. Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (April 2009), 65--76. DOI:http://dx.doi.org/k10.1145/ 1498765.1498785 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. G. Zumbusch. 2001. On the quality of space-filling curve induced partitions. Zeitschrift für Angewandte Mathematik und Mechanik 81, Suppl. 1 (2001), 25--28. DOI:http://dx.doi.org/10.1007/3-540-47789-6_4Google ScholarGoogle Scholar
  41. G. Zumbusch. 2002. Load balancing for adaptively refined grids. Proc. Appl. Math. Mechanics 1 (2002), 534--537.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Parallel Memory-Efficient Adaptive Mesh Refinement on Structured Triangular Meshes with Billions of Grid Cells

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Mathematical Software
      ACM Transactions on Mathematical Software  Volume 43, Issue 3
      September 2017
      232 pages
      ISSN:0098-3500
      EISSN:1557-7295
      DOI:10.1145/2988516
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 September 2016
      • Accepted: 1 May 2016
      • Revised: 1 December 2015
      • Received: 1 September 2014
      Published in toms Volume 43, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader