skip to main content
10.1145/3093172.3093230acmconferencesArticle/Chapter ViewAbstractPublication PagespascConference Proceedingsconference-collections
research-article

Asynchronous Task-Based Parallelization of Algebraic Multigrid

Authors Info & Claims
Published:26 June 2017Publication History

ABSTRACT

As processor clock rates become more dynamic and workloads become more adaptive, the vulnerability to global synchronization that already complicates programming for performance in today's petascale environment will be exacerbated. Algebraic multigrid (AMG), the solver of choice in many large-scale PDE-based simulations, scales well in the weak sense, with fixed problem size per node, on tightly coupled systems when loads are well balanced and core performance is reliable. However, its strong scaling to many cores within a node is challenging. Reducing synchronization and increasing concurrency are vital adaptations of AMG to hybrid architectures. Recent communication-reducing improvements to classical additive AMG by Vassilevski and Yang improve concurrency and increase communication-computation overlap, while retaining convergence properties close to those of standard multiplicative AMG, but remain bulk synchronous. We extend the Vassilevski and Yang additive AMG to asynchronous task-based parallelism using a hybrid MPI+OmpSs (from the Barcelona Supercomputer Center) within a node, along with MPI for internode communications. We implement a tiling approach to decompose the grid hierarchy into parallel units within task containers. We compare against the MPI-only BoomerAMG and the Auxiliary-space Maxwell Solver (AMS) in the hypre library for the 3D Laplacian operator and the electromagnetic diffusion, respectively. In time to solution for a full solve an MPI-OmpSs hybrid improves over an all-MPI approach in strong scaling at full core count (32 threads per single Haswell node of the Cray XC40) and maintains this per node advantage as both weak scale to thousands of cores, with MPI between nodes.

References

  1. 2016. Intel Xeon Processor E5 v3 Product Family: Processor Specification Update. Technical Report 330785-010US. Intel Corporation.Google ScholarGoogle Scholar
  2. Ahmad Abdelfattah, David Keyes, and Hatem Ltaief. 2016. KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators. ACM Trans. Math. Softw. 42, 3, Article 18 (May 2016), 31 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Emmanuel Agullo, Jim Demmel, Jack Dongarra, Bilel Hadri, Jakub Kurzak, Julien Langou, Hatem Ltaief, Piotr Luszczek, and Stanimire Tomov. 2009. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. Journal of Physics: Conference Series 180, 1 (2009), 012037. http://stacks.iop.org/1742-6596/180/i=1/a=012037Google ScholarGoogle ScholarCross RefCross Ref
  4. Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 23, 2 (2011), 187--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Allison H. Baker, Robert D. Falgout, Todd Gamblin, Tzanio V. Kolev, Martin Schulz, and Ulrike Meier Yang. 2012. Scaling Algebraic Multigrid Solvers: On the Road to Exascale. Springer Berlin Heidelberg, Berlin, Heidelberg, 215--226.Google ScholarGoogle Scholar
  6. Barcelona Supercomputing Center 2016. Extrae. Barcelona Supercomputing Center. https://www.bsc.es/computer-sciences/extraeGoogle ScholarGoogle Scholar
  7. Barcelona Supercomputing Center 2016. Paraver: Performance Analysis Tools. Barcelona Supercomputing Center. https://www.bsc.es/computer-sciences/performance-tools/paraverGoogle ScholarGoogle Scholar
  8. Peter Bastian, Markus Blatt, Andreas Dedner, Christian Engwer, Robert Klöfkorn, Ralf Kornhuber, Mario Ohlberger, and Oliver Sander. 2008. A generic grid interface for parallel and adaptive scientific computing. Part II: Implementation and tests in DUNE. Computing 82, 2 (2008), 121--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing Locality and Independence with Logical Regions. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Los Alamitos, CA, USA, Article 66, 11 pages. http://dl.acm.org/citation.cfm?id=2388996.2389086 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Wieb Bosma, John Cannon, and Catherine Playoust. 1997. The MAGMA algebra system. I. The user language. J. Symbolic Comput. 24, 3--4 (1997), 235--265. Computational algebra and number theory (London, 1993). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Achi Brandt. 1977. Multi-level adaptive solutions to boundary-value problems. Math. Comp. 31 (1977), 333--390.Google ScholarGoogle ScholarCross RefCross Ref
  12. Marian Brezina, R Falgout, Scott MacLachlan, T Manteuffel, S McCormick, and John Ruge. 2006. Adaptive Algebraic Multigrid. SIAM Journal on Scientific Computing 27, 4 (2006), 1261--1286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Center for Applied Scientific Computing, Lawrence Livermore National Laboratory 2016. hypre: High Performnce Preconditioners. Center for Applied Scientific Computing, Lawrence Livermore National Laboratory.Google ScholarGoogle Scholar
  14. Hans De Sterck, Robert D. Falgout, Joshua W. Nolting, and Ulrike Meier Yang. 2008. Distance-two interpolation for parallel algebraic multigrid. Numerical Linear Algebra with Applications 15, 2--3 (2008), 115--139.Google ScholarGoogle ScholarCross RefCross Ref
  15. Jack Dongarra, Jakub Kurzak, Asim YarKhan, Mathieu Faverge, and Vijay Joshi. 2011. QUARK (QUeuing And Runtime for Kernels) Library. (2011). http://icl.cs.utk.edu/quark/index.htmlGoogle ScholarGoogle Scholar
  16. Gábor Dózsa, Sameer Kumar, Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Joe Ratterman, and Rajeev Thakur. 2010. Enabling Concurrent Multithreaded MPI Communication on Multicore Petascale Systems. Springer Berlin Heidelberg, Berlin, Heidelberg, 11--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Alejandro Duran, Eduard Ayguad, Rosa M. Badia, Jesus Labarta, Luis Martinell, Xavier Martorell, and Judit Planas. 2011. OmpSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES. Parallel Processing Letters 21, 02 (2011), 173--193.Google ScholarGoogle ScholarCross RefCross Ref
  18. Hormozd Gahvari, Allison H. Baker, Martin Schulz, Ulrike Meier Yang, Kirk E. Jordan, and William Gropp. 2011. Modeling the Performance of an Algebraic Multigrid Cycle on HPC Platforms. In Proceedings of the International Conference on Supercomputing (ICS '11). ACM, New York, NY, USA, 172--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M.W. Gee, C.M. Siefert, J.J. Hu, R.S. Tuminaro, and M.G. Sala. 2006. ML 5.0 Smoothed Aggregation User's Guide. Technical Report SAND 2006-2649. Sandia National Laboratories.Google ScholarGoogle Scholar
  20. W. Hackbusch. 1985. Multigrid Methods and Applications. Springer-Verlag.Google ScholarGoogle Scholar
  21. Van Emden Henson and Ulrike Meier Yang. 2002. BoomerAMG: A parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics 41, 1 (2002), 155--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Torsten Hoefler, James Dinan, Darius Buntinas, Pavan Balaji, Brian Barrett, Ron Brightwell, William Gropp, Vivek Kale, and Rajeev Thakur. 2013. MPI+ MPI: a new hybrid approach to parallel programming with MPI plus shared memory. Computing 95, 12 (2013), 1121--1136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tzanio V. Kolev and Panayot S. Vassilevski. 2008. Auxiliary Space AMG for H(curl) Problems. Springer Berlin Heidelberg, Berlin, Heidelberg, 147--154.Google ScholarGoogle Scholar
  24. Jesús Labarta. 2010. StarSs: A Programming Model for the Multicore Era. (2010-03-01 2010).Google ScholarGoogle Scholar
  25. X. Lacoste. 2015. Scheduling and memory optimizations for sparse direct solver on multi-core/multi-GPU cluster systems. Ph.D. Dissertation. LaBRI, UniversitÃl' Bordeaux, Talence, France.Google ScholarGoogle Scholar
  26. Jan Mandel, Steve McCormick, and John Ruge. 1988. An Algebraic Theory for Multigrid Methods for Variational Problems. SIAM J. Numer. Anal. 25, 1 (Feb. 1988), 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. McCormick. 1989. Multilevel Adaptive Methods for Partial Differential Equations. Society for Industrial and Applied Mathematics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. William F. Mitchell. 1999. The Full Domain Partition Approach to Parallel Adaptive Refinement. Springer New York, New York, NY, 151--161.Google ScholarGoogle Scholar
  29. Philip J. Mucci, Shirley Browne, Christine Deane, and George Ho. 1999. PAPI: A Portable Interface to Hardware Performance Counters. In In Proceedings of the Department of Defense HPCMP Users Group Conference. 7--10.Google ScholarGoogle Scholar
  30. Jongsoo Park, Mikhail Smelyanskiy, Ulrike Meier Yang, Dheevatsa Mudigere, and Pradeep Dubey. 2015. High-performance Algebraic Multigrid Solver Optimized for Multi-core Based Distributed Parallel Systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). ACM, New York, NY, USA, Article 54, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sebastian Reiter, Andreas Vogel, Ingo Heppner, Martin Rupp, and Gabriel Wittum. 2013. A Massively Parallel Geometric Multigrid Solver on Hierarchically Distributed Grids. Comput. Vis. Sci. 16, 4 (Aug. 2013), 151--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Hans De Sterck, Ulrike Meier Yang, and Jeffrey J. Heys. 2006. Reducing Complexity in Parallel Algebraic Multigrid Preconditioners. SIAM J. Matrix Anal. Appl. 27, 4 (2006), 1019--1039. arXiv:http://dx.doi.org/10.1137/040615729 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Didem Unat, Tan Nguyen, Weiqun Zhang, Muhammed Nufail Farooqi, Burak Bastem, George Michelogiannakis, Ann Almgren, and John Shalf. 2016. TiDA: High-Level Programming Abstractions for Data Locality Management. Springer International Publishing, Cham, 116--135.Google ScholarGoogle Scholar
  34. Panayot S. Vassilevski and Ulrike M. Yang. 2014. Reducing communication in algebraic multigrid using additive variants. Numerical Lin. Alg. with Applic. 21, 2 (2014), 275--296.Google ScholarGoogle ScholarCross RefCross Ref
  35. P. Wesseling. 1982. A robust and efficient multigrid method. Springer Berlin Heidelberg, Berlin, Heidelberg, 614--630.Google ScholarGoogle Scholar
  36. Ulrike Meier Yang. 2006. Parallel Algebraic Multigrid Methods --- High Performance Preconditioners. Springer Berlin Heidelberg, Berlin, Heidelberg, 209--236.Google ScholarGoogle Scholar

Index Terms

  1. Asynchronous Task-Based Parallelization of Algebraic Multigrid

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PASC '17: Proceedings of the Platform for Advanced Scientific Computing Conference
          June 2017
          136 pages
          ISBN:9781450350624
          DOI:10.1145/3093172

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 June 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          PASC '17 Paper Acceptance Rate13of33submissions,39%Overall Acceptance Rate83of185submissions,45%

          Upcoming Conference

          PASC '24
          Platform for Advanced Scientific Computing Conference
          June 3 - 5, 2024
          Zurich , Switzerland

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader