From the Publisher:
This book presents a unified treatment of recently developed techniques and current understanding about solving systems of linear equations and large scale eigenvalue problems on high-performance computers. It provides a rapid introduction to the world of vector and parallel processing for these linear algebra applications.
Topics include major elements of advanced-architecture computers and their performance, recent algorithmic development, and software for direct solution of dense matrix problems, direct solution of sparse systems of equations, iterative solution of sparse systems of equations, and solution of large sparse eigenvalue problems.
This book supersedes the SIAM publication Solving Linear Systems on Vector and Shared Memory Computers, which appeared in 1990. The new book includes a considerable amount of new material in addition to incorporating a substantial revision of existing text.
About the Authors:
Jack J. Dongarra is a Distinguished Professor of Computer Science at the University of Tennessee and a Distinguished Scientist at Oak Ridge National Laboratory. Iain S. Duff is Group Leader of Numerical Analysis at the CCLRC Rutherford Appleton Laboratory, the Project Leader for the Parallel Algorithms Group at CERFACS in Toulouse, and a Visiting Professor of Mathematics at the University of Strathclyde. Danny C. Sorensen is a Professor of Computational and Applied Mathematics at Rice University. Henk A. van der Vorst is a Professor in Numerical Analysis at Utrecht University in the Netherlands.
Cited By
- Dumas J, van der Hoeven J, Pernet C and Roche D LU Factorization with Errors Proceedings of the 2019 on International Symposium on Symbolic and Algebraic Computation, (131-138)
- Amestoy P, Buttari A, L'Excellent J and Mary T (2019). Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures, ACM Transactions on Mathematical Software, 45:1, (1-26), Online publication date: 28-Mar-2019.
- Thomas A and Kumar A (2018). A comparative evaluation of systems for scalable linear algebra-based analytics, Proceedings of the VLDB Endowment, 11:13, (2168-2182), Online publication date: 1-Sep-2018.
- Thomas A and Kumar A (2019). A comparative evaluation of systems for scalable linear algebra-based analytics, Proceedings of the VLDB Endowment, 11:13, (2168-2182), Online publication date: 1-Sep-2018.
- Dumas J and Pernet C Symmetric Indefinite Triangular Factorization Revealing the Rank Profile Matrix Proceedings of the 2018 ACM International Symposium on Symbolic and Algebraic Computation, (151-158)
- Bentbib A, El Guide M, Jbilou K and Reichel L (2017). Global GolubKahan bidiagonalization applied to large discrete ill-posed problems, Journal of Computational and Applied Mathematics, 322:C, (46-56), Online publication date: 1-Oct-2017.
- Dumas J, Gautier T, Pernet C, Roch J and Sultan Z (2016). Recursion based parallelization of exact dense linear algebra routines for Gaussian elimination, Parallel Computing, 57:C, (235-249), Online publication date: 1-Sep-2016.
- Gangwon Jo , Jeongho Nah , Jun Lee , Jungwon Kim and Jaejin Lee (2015). Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes, IEEE Transactions on Parallel and Distributed Systems, 26:7, (1814-1825), Online publication date: 1-Jul-2015.
- Dumas J, Pernet C and Sultan Z Computing the Rank Profile Matrix Proceedings of the 2015 ACM on International Symposium on Symbolic and Algebraic Computation, (149-156)
- Gilli M and Schumann E (2014). Optimization cultures, WIREs Computational Statistics, 6:5, (352-358), Online publication date: 18-Aug-2014.
- Dumas J, Pernet C and Sultan Z Simultaneous computation of the row and column rank profiles Proceedings of the 38th International Symposium on Symbolic and Algebraic Computation, (181-188)
- Zhang W, Betz V and Rose J (2012). Portable and scalable FPGA-based acceleration of a direct linear system solver, ACM Transactions on Reconfigurable Technology and Systems, 5:1, (1-26), Online publication date: 1-Mar-2012.
- Luszczek P and Dongarra J Reducing the time to tune parallel dense linear algebra routines with partial execution and performance modeling Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I, (730-739)
- Rozložník M, Shklarski G and Toledo S (2011). Partitioned Triangular Tridiagonalization, ACM Transactions on Mathematical Software, 37:4, (1-16), Online publication date: 1-Feb-2011.
- Michelini P (2010). Direct multi-grid methods for linear systems with harmonic aliasing patterns, IEEE Transactions on Signal Processing, 58:10, (5091-5105), Online publication date: 1-Oct-2010.
- Verschelde J and Yoffe G Polynomial homotopies on multicore workstations Proceedings of the 4th International Workshop on Parallel and Symbolic Computation, (131-140)
- Anzt H, Heuveline V and Rocker B An error correction solver for linear systems Proceedings of the 9th international conference on High performance computing for computational science, (58-70)
- Anzt H, Heuveline V and Rocker B Mixed precision iterative refinement methods for linear systems Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2, (237-247)
- Gustavson F, Waśniewski J, Dongarra J and Langou J (2010). Rectangular full packed format for cholesky's algorithm, ACM Transactions on Mathematical Software, 37:2, (1-21), Online publication date: 1-Apr-2010.
- Emiris I, Pan V and Tsigaridas E Algebraic and numerical algorithms Algorithms and theory of computation handbook, (17-17)
- Soveiko N, Nakhla M and Achar R (2010). Comparison study of performance of parallel steady state solver on different computer architectures, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 29:1, (65-77), Online publication date: 1-Jan-2010.
- Bekas C, Curioni A and Fedulova I Low cost high performance uncertainty quantification Proceedings of the 2nd Workshop on High Performance Computational Finance, (1-8)
- Burckhardt K, Szczerba D, Brown J, Muralidhar K and Székely G Fast Implicit Simulation of Oscillatory Flow in Human Abdominal Bifurcation Using a Schur Complement Preconditioner Proceedings of the 15th International Euro-Par Conference on Parallel Processing, (747-759)
- Fatica M Accelerating linpack with CUDA on heterogenous clusters Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, (46-51)
- Gustavson F, Karlsson L and Kågström B (2009). Distributed SBP Cholesky factorization algorithms with near-optimal scheduling, ACM Transactions on Mathematical Software, 36:2, (1-25), Online publication date: 1-Mar-2009.
- Kurzak J and Dongarra J (2009). QR factorization for the Cell Broadband Engine, Scientific Programming, 17:1-2, (31-42), Online publication date: 1-Jan-2009.
- Volkov V and Demmel J Benchmarking GPUs to tune dense linear algebra Proceedings of the 2008 ACM/IEEE conference on Supercomputing, (1-11)
- Pan V, Ivolgin D, Murphy B, Rosholt R, Tang Y and Yan X Additive preconditioning for matrix computations Proceedings of the 3rd international conference on Computer science: theory and applications, (372-383)
- Howell G, Demmel J, Fulton C, Hammarling S and Marmol K (2008). Cache efficient bidiagonalization using BLAS 2.5 operators, ACM Transactions on Mathematical Software, 34:3, (1-33), Online publication date: 1-May-2008.
- Sala M, Stanley K and Heroux M (2008). On the design of interfaces to sparse direct solvers, ACM Transactions on Mathematical Software, 34:2, (1-22), Online publication date: 1-Mar-2008.
- Burke E, Dror M and Orlin J (2008). Scheduling malleable tasks with interdependent processing rates, Discrete Applied Mathematics, 156:5, (620-626), Online publication date: 1-Mar-2008.
- Coulaud O, Fortin P and Roman J (2008). High performance BLAS formulation of the multipole-to-local operator in the fast multipole method, Journal of Computational Physics, 227:3, (1836-1862), Online publication date: 1-Jan-2008.
- Zekri A and Sedukhin S Performance evaluation of basic linear algebra subroutines on a matrix co-processor Proceedings of the 7th international conference on Parallel processing and applied mathematics, (1190-1199)
- Pan V, Murphy B, Rosholt R and Tabanjeh M The schur aggregation for solving linear systems of equations Proceedings of the 2007 international workshop on Symbolic-numeric computation, (142-151)
- Belgin M, Ribbens C and Back G An operation stacking framework for large ensemble computations Proceedings of the 21st annual international conference on Supercomputing, (83-92)
- Gould N, Scott J and Hu Y (2007). A numerical evaluation of sparse direct solvers for the solution of large sparse symmetric linear systems of equations, ACM Transactions on Mathematical Software, 33:2, (10-es), Online publication date: 1-Jun-2007.
- Bertoldo A, Bianco M and Pucci G A static parallel multifrontal solver for finite element meshes Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications, (734-746)
- Dhillon I, Parlett B and Vömel C (2006). The design and implementation of the MRRR algorithm, ACM Transactions on Mathematical Software, 32:4, (533-560), Online publication date: 1-Dec-2006.
- Chen W and Poirier B (2006). Parallel implementation of efficient preconditioned linear solver for grid-based applications in chemical physics. II, Journal of Computational Physics, 219:1, (198-209), Online publication date: 20-Nov-2006.
- Gradl T, Spörl A, Huckle T, Glaser S and Schulte-Herbrüggen T Parallelising matrix operations on clusters for an optimal control-based quantum compiler Proceedings of the 12th international conference on Parallel Processing, (751-762)
- Acebrón J, Durán R, Rico R and Spigler R A new domain decomposition approach suited for grid computing Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing, (744-753)
- Remón A, Quintana-Ortí E and Quintana-Ortí G Cholesky factorization of band matrices using multithreaded BLAS Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing, (608-616)
- Kurzak J and Dongarra J Implementing linear algebra routines on multi-core processors with pipelining and a look ahead Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing, (147-156)
- Demmel J, Dongarra J, Parlett B, Kahan W, Gu M, Bindel D, Hida Y, Li X, Marques O, Riedy E, Vömel C, Langou J, Luszczek P, Kurzak J, Buttari A, Langou J and Tomov S Prospectus for the next LAPACK and ScaLAPACK libraries Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing, (11-23)
- Marques O and Vasconcelos P Evaluation of linear solvers for astrophysics transfer problems Proceedings of the 7th international conference on High performance computing for computational science, (466-475)
- Gravvanis G and Giannoutakis K Parallel exact and approximate arrow-type inverses on symmetric multiprocessor systems Proceedings of the 6th international conference on Computational Science - Volume Part I, (506-513)
- Gravvanis G and Giannoutakis K On the performance of parallel normalized explicit preconditioned conjugate gradient type methods Proceedings of the 20th international conference on Parallel and distributed processing, (309-309)
- Amestoy P, Guermouche A, L'Excellent J and Pralet S (2006). Hybrid scheduling for the parallel solution of linear systems, Parallel Computing, 32:2, (136-156), Online publication date: 1-Feb-2006.
- Galoppo N, Govindaraju N, Henson M and Manocha D LU-GPU Proceedings of the 2005 ACM/IEEE conference on Supercomputing
- Zuberek W and Perera T Performance analysis of distributed iterative linear solvers Proceedings of the 7th WSEAS International Conference on Mathematical Methods and Computational Techniques In Electrical Engineering, (194-199)
- Garz E and García I (2005). Approaches Based on Permutations for Partitioning Sparse Matrices on Multiprocessors, The Journal of Supercomputing, 34:1, (41-61), Online publication date: 1-Oct-2005.
- D'Azevedo E, Fahey M and Mills R Vectorized sparse matrix multiply for compressed row storage format Proceedings of the 5th international conference on Computational Science - Volume Part I, (99-106)
- Gravvanis G and Giannoutakis K (2005). Parallel preconditioned conjugate gradient square method based on normalized approximate inverses, Scientific Programming, 13:2, (79-91), Online publication date: 1-Apr-2005.
- D'Alberto P and Nicolau A JuliusC Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, (117-131)
- Vasconcelos P and d'Almeida F Performance evaluation of a parallel algorithm for a radiative transfer problem Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing, (864-871)
- Ortigosa E, Romero L and Ramos J (2003). Parallel scheduling of the PCG method for banded matrices rising from FDM/FEM, Journal of Parallel and Distributed Computing, 63:12, (1243-1256), Online publication date: 1-Dec-2003.
- Nakajima K Parallel Iterative Solvers of GeoFEM with Selective Blocking Preconditioning for Nonlinear Contact Problems on the Earth Simulator Proceedings of the 2003 ACM/IEEE conference on Supercomputing
- Chen Z, Dongarra J, Luszczek P and Roche K (2003). Self-adapting software for numerical linear algebra and LAPACK for clusters, Parallel Computing, 29:11-12, (1723-1743), Online publication date: 1-Nov-2003.
- Amestoy P, Duff I, Pralet S and Vömel C (2003). Adapting a parallel sparse direct solver to architectures with clusters of SMPs, Parallel Computing, 29:11-12, (1645-1668), Online publication date: 1-Nov-2003.
- Hanson F (2003). Local supercomputing training in the computational sciences using remote national centers, Future Generation Computer Systems, 19:8, (1335-1347), Online publication date: 1-Nov-2003.
- Chronopoulos A, Grosu D, Wissink A, Benche M and Liu J (2003). An efficient 3D grid based scheduling for heterogeneous systems, Journal of Parallel and Distributed Computing, 63:9, (827-837), Online publication date: 1-Sep-2003.
- Tracy F Application of the multi-level parallelism (MLP) software to a finite element groundwater program using iterative solvers with comparison to MPI Proceedings of the 2003 international conference on Computational science: PartIII, (725-735)
- Benzi M and Bertaccini D (2003). Approximate Inverse Preconditioning for Shifted Linear Systems, BIT, 43:2, (231-244), Online publication date: 1-Jun-2003.
- Foschi P and Kontoghiorghes E (2003). Estimation of VAR Models, Computational Economics, 21:1-2, (3-22), Online publication date: 1-Feb-2003.
- Gryazin Y, Klibanov M and Lucas T (2003). Two numerical methods for an inverse problem for the 2-D Helmholtz equation, Journal of Computational Physics, 184:1, (122-148), Online publication date: 1-Jan-2003.
- Shen C and Zhang J (2002). Parallel two level block ILU Preconditioning techniques for solving large sparse linear systems, Parallel Computing, 28:10, (1451-1475), Online publication date: 1-Oct-2002.
- McCombs J and Stathopoulos A Multigrain Parallelism for Eigenvalue Computations on Networks of Clusters Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
- Katagiri T Performance evaluation of parallel gram-schmidt re-orthogonalization methods Proceedings of the 5th international conference on High performance computing for computational science, (302-314)
- Schönauer W and Häfner H (2002). Numerical experiments to optimize the use of (I)LU preconditioning in the iterative linear solver package LINSOL, Applied Numerical Mathematics, 41:1, (23-37), Online publication date: 1-Apr-2002.
- Simoncini V and Eldén L (2002). Inexact Rayleigh Quotient-Type Methods for Eigenvalue Computations, BIT, 42:1, (159-182), Online publication date: 1-Mar-2002.
- Zlatev Z Massive data set issues in air pollution modelling Handbook of massive data sets, (1169-1220)
- Breitner M (2000). Robust Optimal Onboard Reentry Guidance of a Space Shuttle, Journal of Optimization Theory and Applications, 107:3, (481-503), Online publication date: 1-Dec-2000.
- Theobald K, Agrawal G, Kumar R, Heber G, Gao G, Stodghill P and Pingali K Landing CG on EARTH Proceedings of the 2000 ACM/IEEE conference on Supercomputing, (4-es)
- Lee H, Kim J, Hong S and Lee S Task scheduling using a block dependency DAG for block-oriented sparse Cholesky factorization Proceedings of the 2000 ACM symposium on Applied computing - Volume 2, (641-648)
Recommendations
High performance linear algebra algorithms: an introduction
PARA'04: Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific ComputingThis Mini-Symposium consisted of two back to back sessions, each consisting of five presentations, held on the afternoon of Monday, June 21, 2004. A major theme of both sessions was novel data structures for the matrices of dense linear algebra, DLA. ...