skip to main content
Skip header Section
ScaLAPACK user's guideOctober 1997
Publisher:
  • Society for Industrial and Applied Mathematics
  • 3600 University City Science Center Philadelphia, PA
  • United States
ISBN:978-0-89871-397-8
Published:01 October 1997
Pages:
325
Skip Bibliometrics Section
Bibliometrics
Abstract

No abstract available.

Cited By

  1. ACM
    Pan J, Xiao L, Tian M, Liu T and Wang L Heterogeneous multi-core optimization of MUMPS solver and its application Proceedings of the 2021 ACM International Conference on Intelligent Computing and its Emerging Applications, (122-127)
  2. Chang T, Larson J and Watson L Multiobjective optimization of the variability of the high-performance linpack solver Proceedings of the Winter Simulation Conference, (3081-3092)
  3. Cámara J, Cuenca J and Giménez D (2020). Integrating software and hardware hierarchies in an autotuning method for parallel routines in heterogeneous clusters, The Journal of Supercomputing, 76:12, (9922-9941), Online publication date: 1-Dec-2020.
  4. Ben M, Yang C, Li Z, Jornada F, Louie S and Deslippe J Accelerating large-scale excited-state GW calculations on leadership HPC systems Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (1-11)
  5. ACM
    Luo S, Gao Z, Gubanov M, Perez L, Jankov D and Jermaine C (2020). Scalable linear algebra on a relational database system, Communications of the ACM, 63:8, (93-101), Online publication date: 22-Jul-2020.
  6. Liu Y, Sid-Lakhdar W, Rebrova E, Ghysels P and Li X (2020). A parallel hierarchical blocked adaptive cross approximation algorithm, International Journal of High Performance Computing Applications, 34:4, (394-408), Online publication date: 1-Jul-2020.
  7. ACM
    Del Ben M, Marques O and Canning A Improved Unconstrained Energy Functional Method for Eigensolvers in Electronic Structure Calculations Proceedings of the 48th International Conference on Parallel Processing, (1-11)
  8. Li S, Liu J and Du Y (2022). A high performance implementation of Zolo-SVD algorithm on distributed memory systems, Parallel Computing, 86:C, (57-65), Online publication date: 1-Aug-2019.
  9. ACM
    Dongarra J, Gates M, Haidar A, Kurzak J, Luszczek P, Wu P, Yamazaki I, Yarkhan A, Abalenkovs M, Bagherpour N, Hammarling S, Šístek J, Stevens D, Zounon M and Relton S (2019). PLASMA, ACM Transactions on Mathematical Software, 45:2, (1-35), Online publication date: 30-Jun-2019.
  10. ACM
    Kurzak J, Gates M, Charara A, YarKhan A and Dongarra J Least squares solvers for distributed-memory machines with GPU accelerators Proceedings of the ACM International Conference on Supercomputing, (117-126)
  11. ACM
    Nguyen D, Filippone M and Michiardi P Exact gaussian process regression with distributed computations Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, (1286-1295)
  12. Peise E and Bientinesi P (2020). The ELAPS framework, International Journal of High Performance Computing Applications, 33:2, (353-365), Online publication date: 1-Mar-2019.
  13. Beaumont O, Becker B, DeFlumere A, Eyraud-Dubois L, Lambert T and Lastovetsky A (2018). Recent Advances in Matrix Partitioning for Parallel Computing on Heterogeneous Platforms, IEEE Transactions on Parallel and Distributed Systems, 30:1, (218-229), Online publication date: 1-Jan-2019.
  14. Lee D, Hoshi T, Sogabe T, Miyatake Y and Zhang S (2018). Solution of the k-th eigenvalue problem in large-scale electronic structure calculations, Journal of Computational Physics, 371:C, (618-632), Online publication date: 15-Oct-2018.
  15. Thomas A and Kumar A (2018). A comparative evaluation of systems for scalable linear algebra-based analytics, Proceedings of the VLDB Endowment, 11:13, (2168-2182), Online publication date: 1-Sep-2018.
  16. Thomas A and Kumar A (2019). A comparative evaluation of systems for scalable linear algebra-based analytics, Proceedings of the VLDB Endowment, 11:13, (2168-2182), Online publication date: 1-Sep-2018.
  17. ACM
    Ballard G, Demmel J, Grigori L, Jacquelin M and Knight N A 3D Parallel Algorithm for QR Decomposition Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, (55-65)
  18. Rathore M, Son H, Ahmad A, Paul A and Jeon G (2018). Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem, International Journal of Parallel Programming, 46:3, (630-646), Online publication date: 1-Jun-2018.
  19. Kalantzis V, Malossi A, Bekas C, Curioni A, Gallopoulos E and Saad Y (2018). A scalable iterative dense linear system solver for multiple right-hand sides in data analytics, Parallel Computing, 74:C, (136-153), Online publication date: 1-May-2018.
  20. ACM
    Ida A, Nakashima H and Kawai M Parallel Hierarchical Matrices with Block Low-rank Representation on Distributed Memory Computer Systems Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, (232-240)
  21. ACM
    Hoque R, Herault T, Bosilca G and Dongarra J Dynamic task discovery in PaRSEC Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, (1-8)
  22. Rico-Gallego J, Lastovetsky A and Díaz-Martín J (2017). Model-Based Estimation of the Communication Cost of Hybrid Data-Parallel Applications on Heterogeneous Clusters, IEEE Transactions on Parallel and Distributed Systems, 28:11, (3215-3228), Online publication date: 1-Nov-2017.
  23. Carlberg K, Barone M and Antil H (2017). Galerkin v. least-squares Petrov-Galerkin projection in nonlinear model reduction, Journal of Computational Physics, 330:C, (693-734), Online publication date: 1-Feb-2017.
  24. Stoykov S and Margenov S (2017). Numerical methods and parallel algorithms for computation of periodic responses of plates, Journal of Computational and Applied Mathematics, 310:C, (200-212), Online publication date: 15-Jan-2017.
  25. Ballard G, Demmel J, Gearhart A, Lipshitz B, Oltchik Y, Schwartz O and Toledo S Network topologies and inevitable contention Proceedings of the First Workshop on Optimization of Communication in HPC, (39-52)
  26. ACM
    Lemarinier P, Hasanov K, Venugopal S and Katrinis K Architecting Malleable MPI Applications for Priority-driven Adaptive Scheduling Proceedings of the 23rd European MPI Users' Group Meeting, (74-81)
  27. Niu Q, Dinan J, Tirukkovalur S, Benali A, Kim J, Mitas L, Wagner L and Sadayappan P (2016). Global-view coefficients, Concurrency and Computation: Practice & Experience, 28:13, (3655-3671), Online publication date: 10-Sep-2016.
  28. Prikopa K, Gansterer W and Wimmer E (2016). Parallel iterative refinement linear least squares solvers based on all-reduce operations, Parallel Computing, 57:C, (167-184), Online publication date: 1-Sep-2016.
  29. Sukkari D, Ltaief H and Keyes D High Performance Polar Decomposition on Distributed Memory Systems Proceedings of the 22nd International Conference on Euro-Par 2016: Parallel Processing - Volume 9833, (605-616)
  30. ACM
    Bosagh Zadeh R, Meng X, Ulanov A, Yavuz B, Pu L, Venkataraman S, Sparks E, Staple A and Zaharia M Matrix Computations and Optimization in Apache Spark Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (31-38)
  31. ACM
    Schmidt D, Chen W and Ostrouchov G Introducing a New Client/Server Framework for Big Data Analytics with the R Language Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale, (1-9)
  32. ACM
    Wang S, Li X, Rouet F, Xia J and De Hoop M (2016). A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure, ACM Transactions on Mathematical Software, 42:3, (1-21), Online publication date: 15-Jun-2016.
  33. ACM
    Wu W, Bosilca G, vandeVaart R, Jeaugey S and Dongarra J GPU-Aware Non-contiguous Data Movement In Open MPI Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, (231-242)
  34. Coullon H and Limet S (2016). The SIPSim implicit parallelism model and the SkelGIS library, Concurrency and Computation: Practice & Experience, 28:7, (2120-2144), Online publication date: 1-May-2016.
  35. ACM
    Chabbi M, Lavrijsen W, de Jong W, Sen K, Mellor-Crummey J and Iancu C (2015). Barrier elision for production parallel programs, ACM SIGPLAN Notices, 50:8, (109-119), Online publication date: 18-Dec-2015.
  36. ACM
    Calvin J, Lewis C and Valeev E Scalable task-based algorithm for multiplication of block-rank-sparse matrices Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, (1-8)
  37. ACM
    Solcà R, Kozhevnikov A, Haidar A, Tomov S, Dongarra J and Schulthess T Efficient implementation of quantum materials simulations on distributed CPU-GPU systems Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (1-12)
  38. Hasanov K, Quintin J and Lastovetsky A (2015). Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms, The Journal of Supercomputing, 71:11, (3991-4014), Online publication date: 1-Nov-2015.
  39. Song F and Dongarra J (2015). A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems, Concurrency and Computation: Practice & Experience, 27:14, (3702-3723), Online publication date: 25-Sep-2015.
  40. ACM
    Maas R, Hyrkas J, Telford O, Balazinska M, Connolly A and Howe B Gaussian Mixture Models Use-Case Proceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and Analytics, (1-8)
  41. ACM
    Gautier T, Roch J, Sultan Z and Vialla B Parallel algebraic linear algebra dedicated interface Proceedings of the 2015 International Workshop on Parallel Symbolic Computation, (34-43)
  42. ACM
    Elgamal T, Yabandeh M, Aboulnaga A, Mustafa W and Hefeeda M sPCA Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, (79-91)
  43. Heßel S, Fernandes O, Boblest S, Offenhäuser P, Hoffmann M, Beck A, Ertl T, Glass C, Munz C and Sadlo F Visualization of 2D wave propagation by Huygens' principle Proceedings of the 15th Eurographics Symposium on Parallel Graphics and Visualization, (19-28)
  44. Berljafa M, Wortmann D and Di Napoli E (2015). An optimized and scalable eigensolver for sequences of eigenvalue problems, Concurrency and Computation: Practice & Experience, 27:4, (905-922), Online publication date: 25-Mar-2015.
  45. ACM
    Ballard G, Demmel J and Knight N (2015). Avoiding Communication in Successive Band Reduction, ACM Transactions on Parallel Computing, 1:2, (1-37), Online publication date: 18-Feb-2015.
  46. ACM
    Chabbi M, Lavrijsen W, de Jong W, Sen K, Mellor-Crummey J and Iancu C Barrier elision for production parallel programs Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, (109-119)
  47. Galizia A, D'Agostino D and Clematis A (2015). An MPI-CUDA library for image processing on HPC architectures, Journal of Computational and Applied Mathematics, 273:C, (414-427), Online publication date: 1-Jan-2015.
  48. Solomonik E, Matthews D, Hammond J, Stanton J and Demmel J (2014). A massively parallel tensor contraction framework for coupled-cluster computations, Journal of Parallel and Distributed Computing, 74:12, (3176-3190), Online publication date: 1-Dec-2014.
  49. Charara A, Ltaief H, Gratadour D, Keyes D, Sevin A, Abdelfattah A, Gendron E, Morel C and Vidal F Pipelining computational stages of the tomographic reconstructor for multi-object adaptive optics on a multi-GPU system Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (262-273)
  50. ACM
    Marker B, Batory D and van de Geijn R Understanding performance stairs Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, (301-312)
  51. ACM
    Kernert D, Köhler F and Lehner W SLACID - sparse linear algebra in a column-oriented in-memory database system Proceedings of the 26th International Conference on Scientific and Statistical Database Management, (1-12)
  52. ACM
    Solomonik E, Carson E, Knight N and Demmel J Tradeoffs between synchronization, communication, and computation in parallel linear algebra computations Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures, (307-318)
  53. ACM
    Wu P and Chen Z FT-ScaLAPACK Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, (49-60)
  54. ACM
    Xiang J, Meng H and Aboulnaga A Scalable matrix inversion using MapReduce Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, (177-190)
  55. ACM
    Fabregat-Traver D and Bientinesi P (2014). Computing Petaflops over Terabytes of Data, ACM Transactions on Mathematical Software, 40:4, (1-22), Online publication date: 1-Jun-2014.
  56. ACM
    Cao C, Dongarra J, Du P, Gates M, Luszczek P and Tomov S clMAGMA Proceedings of the International Workshop on OpenCL 2013 & 2014, (1-9)
  57. ACM
    Boman E, Devine K and Rajamanickam S Scalable matrix computations on large scale-free graphs using 2D graph partitioning Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-12)
  58. ACM
    Jia Y, Bosilca G, Luszczek P and Dongarra J Parallel reduction to hessenberg form with algorithm-based fault tolerance Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-11)
  59. ACM
    Haidar A, Gates M, Tomov S and Dongarra J Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication Proceedings of the 27th international ACM conference on International conference on supercomputing, (223-232)
  60. ACM
    Bueno J, Martorell X, Badia R, Ayguadé E and Labarta J Implementing OmpSs support for regions of data in architectures with multiple address spaces Proceedings of the 27th international ACM conference on International conference on supercomputing, (359-368)
  61. Carlberg K, Farhat C, Cortial J and Amsallem D (2013). The GNAT method for nonlinear model reduction, Journal of Computational Physics, 242, (623-647), Online publication date: 1-Jun-2013.
  62. ACM
    Bosner N, Bujanović Z and Drmač Z (2013). Efficient generalized Hessenberg form and applications, ACM Transactions on Mathematical Software, 39:3, (1-19), Online publication date: 1-Apr-2013.
  63. ACM
    Ltaief H, Luszczek P and Dongarra J (2013). High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures, ACM Transactions on Mathematical Software, 39:3, (1-22), Online publication date: 1-Apr-2013.
  64. ACM
    Ballard G, Demmel J, Holtz O and Schwartz O (2013). Graph expansion and communication costs of fast matrix multiplication, Journal of the ACM, 59:6, (1-23), Online publication date: 1-Dec-2012.
  65. Fujisawa K, Sato H, Matsuoka S, Endo T, Yamashita M and Nakata M High-performance general solver for extremely large-scale semidefinite programming problems Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-11)
  66. ACM
    Roderus M, Matveev A and Bungartz H A high-level Fortran interface to parallel matrix algebra Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology, (112-119)
  67. ACM
    Du P, Bouteiller A, Bosilca G, Herault T and Dongarra J (2012). Algorithm-based fault tolerance for dense matrix factorizations, ACM SIGPLAN Notices, 47:8, (225-234), Online publication date: 11-Sep-2012.
  68. Bland W, Du P, Bouteiller A, Herault T, Bosilca G and Dongarra J A checkpoint-on-failure protocol for algorithm-based recovery in standard MPI Proceedings of the 18th international conference on Parallel Processing, (477-488)
  69. Bosilca G, Bouteiller A, Danalis A, Herault T and Dongarra J From serial loops to parallel execution on distributed systems Proceedings of the 18th international conference on Parallel Processing, (246-257)
  70. ACM
    Ballard G, Demmel J, Holtz O, Lipshitz B and Schwartz O Communication-optimal parallel algorithm for strassen's matrix multiplication Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures, (193-204)
  71. ACM
    Song F and Dongarra J A scalable framework for heterogeneous GPU-based clusters Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures, (91-100)
  72. ACM
    Song F, Tomov S and Dongarra J Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems Proceedings of the 26th ACM international conference on Supercomputing, (365-376)
  73. Versaci F and Pingali K Processor allocation for optimistic parallelization of irregular programs Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I, (1-14)
  74. Zhao Y, Zhang J and Chi X Implementations of main algorithms for generalized eigenproblem on GPU accelerator Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II, (473-481)
  75. Izadi M (2012). Parallel $${\mathcal {H}}$$H-matrix arithmetic on distributed-memory systems, Computing and Visualization in Science, 15:2, (87-97), Online publication date: 1-Apr-2012.
  76. ACM
    Du P, Bouteiller A, Bosilca G, Herault T and Dongarra J Algorithm-based fault tolerance for dense matrix factorizations Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, (225-234)
  77. ACM
    Gopalakrishnan G, Kirby R, Siegel S, Thakur R, Gropp W, Lusk E, De Supinski B, Schulz M and Bronevetsky G (2011). Formal analysis of MPI-based parallel programs, Communications of the ACM, 54:12, (82-91), Online publication date: 1-Dec-2011.
  78. ACM
    Dongarra J, Faverge M, Ltaief H and Luszczek P High performance matrix inversion based on LU factorization for multicore architectures Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers, (33-42)
  79. ACM
    Solomonik E, Bhatele A and Demmel J Improving communication performance in dense linear algebra via topology aware collectives Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, (1-11)
  80. ACM
    Lubin M, Petra C, Anitescu M and Zavala V Scalable stochastic optimization of complex energy systems Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, (1-64)
  81. ACM
    Bougeret M, Casanova H, Rabie M, Robert Y and Vivien F Checkpointing strategies for parallel jobs Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, (1-11)
  82. ACM
    Haidar A, Ltaief H and Dongarra J Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, (1-11)
  83. ACM
    Bennett P Sustained systems performance monitoring at the U. S. Department of Defense high performance computing modernization program State of the Practice Reports, (1-11)
  84. ACM
    Davis T (2011). Algorithm 915, SuiteSparseQR, ACM Transactions on Mathematical Software, 38:1, (1-22), Online publication date: 1-Nov-2011.
  85. Mikkelsen C and Kågström B Incomplete cyclic reduction of banded and strictly diagonally dominant linear systems Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I, (80-91)
  86. Luszczek P and Dongarra J Reducing the time to tune parallel dense linear algebra routines with partial execution and performance modeling Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I, (730-739)
  87. Straková H, Gansterer W and Zemen T Distributed QR factorization based on randomized algorithms Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I, (235-244)
  88. Pilarek M and Wyrzykowski R Solving systems of interval linear equations in parallel using multithreaded model and "interval extended zero" method Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I, (206-214)
  89. Gustavson F Cache blocking for linear algebra algorithms Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I, (122-132)
  90. Michailidis P and Margaritis K (2011). Parallel direct methods for solving the system of linear equations with pipelining on a multicore using OpenMP, Journal of Computational and Applied Mathematics, 236:3, (326-341), Online publication date: 1-Sep-2011.
  91. Solomonik E and Demmel J Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms Proceedings of the 17th international conference on Parallel processing - Volume Part II, (90-109)
  92. Sharma P and Hammett G (2011). A fast semi-implicit method for anisotropic diffusion, Journal of Computational Physics, 230:12, (4899-4909), Online publication date: 1-Jun-2011.
  93. Higham N (2011). Gaussian elimination, WIREs Computational Statistics, 3:3, (230-238), Online publication date: 4-Apr-2011.
  94. Paul A, Luisier M and Klimeck G (2010). Modified valence force field approach for phonon dispersion, Journal of Computational Electronics, 9:3-4, (160-172), Online publication date: 1-Dec-2010.
  95. Song F, Ltaief H, Hadri B and Dongarra J Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, (1-11)
  96. Mao F and Shen X LU decomposition on cell broadband engine Proceedings of the 2010 IFIP international conference on Network and parallel computing, (61-75)
  97. Roderus M, Berariu A, Bungartz H, Krüger S, Matveev A and Rösch N Scheduling parallel eigenvalue computations in a quantum chemistry code Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II, (113-124)
  98. Fournié M, Renon N, Renard Y and Ruiz D CFD parallel simulation using Getfem++ and mumps Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II, (77-88)
  99. Agullo E, Bouwmeester H, Dongarra J, Kurzak J, Langou J and Rosenberg L Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures Proceedings of the 9th international conference on High performance computing for computational science, (129-138)
  100. Gustavson F Cache blocking Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I, (22-32)
  101. Kågström B, Kressner D and Shao M On aggressive early deflation in parallel variants of the QR algorithm Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I, (1-10)
  102. Mikkelsen C and Kågström B Parallel solution of narrow banded diagonally dominant linear systems Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2, (280-290)
  103. ACM
    Dinan J, Balaji P, Lusk E, Sadayappan P and Thakur R Hybrid parallel programming with MPI and unified parallel C Proceedings of the 7th ACM international conference on Computing frontiers, (177-186)
  104. Howell G Block Householder computation of sparse matrix singular values Proceedings of the 2010 Spring Simulation Multiconference, (1-8)
  105. Robert Y and Vivien F Algorithmic issues in grid computing Algorithms and theory of computation handbook, (29-29)
  106. ACM
    Rouson D, Adalsteinsson H and Xia J (2010). Design patterns for multiphysics modeling in Fortran 2003 and C++, ACM Transactions on Mathematical Software, 37:1, (1-30), Online publication date: 1-Jan-2010.
  107. Seraphim T, Seraphim E and Travieso G (2009). HieraAnalyses – a tool for hierarchical analysis of parallel programs, International Journal of High Performance Systems Architecture, 2:1, (58-67), Online publication date: 1-Dec-2009.
  108. ACM
    Bekas C, Curioni A and Fedulova I Low cost high performance uncertainty quantification Proceedings of the 2nd Workshop on High Performance Computational Finance, (1-8)
  109. ACM
    Agullo E, Hadri B, Ltaief H and Dongarrra J Comparative study of one-sided factorizations with multiple software packages on multi-core hardware Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, (1-12)
  110. ACM
    Song F, YarKhan A and Dongarra J Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, (1-11)
  111. ACM
    Ballard G, Demmel J, Holtz O and Schwartz O Communication-optimal parallel and sequential Cholesky decomposition Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, (245-252)
  112. Galizia A, D'Agostino D and Clematis A (2009). A Grid framework to enable parallel and concurrent TMA image analyses, International Journal of Grid and Utility Computing, 1:3, (261-271), Online publication date: 1-Aug-2009.
  113. Barrachina S, Benner P and Quintana-Orti E (2009). Parallel solution of large-scale algebraic Bernoulli equations with the matrix sign function method, International Journal of Computational Science and Engineering, 4:2, (88-93), Online publication date: 1-Jul-2009.
  114. Kurzak J and Dongarra J (2009). QR factorization for the Cell Broadband Engine, Scientific Programming, 17:1-2, (31-42), Online publication date: 1-Jan-2009.
  115. Kolberg M, Bohlender G and Claudio D Improving the Performance of a Verified Linear System Solver Using Optimized Libraries and Parallel Computation High Performance Computing for Computational Science - VECPAR 2008, (13-26)
  116. Luisier M and Klimeck G A multi-level parallel simulation approach to electron transport in nano-scale transistors Proceedings of the 2008 ACM/IEEE conference on Supercomputing, (1-10)
  117. Andersson P, Granat R, Jonsson I and Kågström B Parallel Algorithms for Triangular Periodic Sylvester-Type Matrix Equations Proceedings of the 14th international Euro-Par conference on Parallel Processing, (780-789)
  118. Vidal A, Garcia V, Alonso P and Bernabeu M (2008). Parallel computation of the eigenvalues of symmetric Toeplitz matrices through iterative methods, Journal of Parallel and Distributed Computing, 68:8, (1113-1121), Online publication date: 1-Aug-2008.
  119. Sanjay H and Vadhiyar S (2008). Performance modeling of parallel applications for grid scheduling, Journal of Parallel and Distributed Computing, 68:8, (1135-1145), Online publication date: 1-Aug-2008.
  120. ACM
    Bientinesi P, Gunter B and Geijn R (2008). Families of algorithms related to the inversion of a Symmetric Positive Definite matrix, ACM Transactions on Mathematical Software, 35:1, (1-22), Online publication date: 22-Jul-2008.
  121. ACM
    Liu L, Li Z and Sameh A Analyzing memory access intensity in parallel programs on multicore Proceedings of the 22nd annual international conference on Supercomputing, (359-367)
  122. Bai Y and Ward R (2008). Parallel block tridiagonalization of real symmetric matrices, Journal of Parallel and Distributed Computing, 68:5, (703-715), Online publication date: 1-May-2008.
  123. ACM
    Sala M, Stanley K and Heroux M (2008). On the design of interfaces to sparse direct solvers, ACM Transactions on Mathematical Software, 34:2, (1-22), Online publication date: 1-Mar-2008.
  124. ACM
    Sala M, Spotz W and Heroux M (2008). PyTrilinos, ACM Transactions on Mathematical Software, 34:2, (1-33), Online publication date: 1-Mar-2008.
  125. ACM
    Dongarra J, Pineau J, Robert Y and Vivien F Matrix product on heterogeneous master-worker platforms Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, (53-62)
  126. De Llano R and Bosque J Parallel implementation of a neural net training application in a heterogeneous grid environment Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II, (1473-1488)
  127. Konda T and Nakamura Y Parallel double divide and conquer and its evaluation on a super computer Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems, (231-236)
  128. Carracciuolo L, Laccetti G and Lapegna M Implementing effective data management policies in distributed and grid computing environments Proceedings of the 7th international conference on Parallel processing and applied mathematics, (902-911)
  129. Lastovetsky A and Reddy R A novel algorithm of optimal matrix partitioning for parallel dense factorization on heterogeneous processors Proceedings of the 9th international conference on Parallel Computing Technologies, (261-275)
  130. Sudarsan R and Ribbens C Efficient multidimensional data redistribution for resizable parallel computations Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications, (182-194)
  131. Marker B, Van Zee F, Goto K, Quintana-Ortí G and van de Geijn R Toward scalable matrix multiply on multithreaded architectures Proceedings of the 13th international Euro-Par conference on Parallel Processing, (748-757)
  132. Eyraud-Dubois L, Legrand A, Quinson M and Vivien F A first step towards automatically building network representations Proceedings of the 13th international Euro-Par conference on Parallel Processing, (160-169)
  133. Luszczek P and Dongarra J (2007). High Performance Development for High End Computing With Python Language Wrapper (PLW), International Journal of High Performance Computing Applications, 21:3, (360-369), Online publication date: 1-Aug-2007.
  134. Raghunathan S (2007). Parallel Computing Algorithms and Applications, Computing in Science and Engineering, 9:4, (64-65), Online publication date: 1-Jul-2007.
  135. ACM
    Zhang H, Smith B, Sternberg M and Zapol P (2007). SIPs, ACM Transactions on Mathematical Software, 33:2, (9-es), Online publication date: 1-Jun-2007.
  136. Acevedo L, Garcia V and Vidal A Compatibility of Scalapack with the Discrete Wavelet Transform Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007, (152-159)
  137. Fukuda M, Braams B, Nakata M, Overton M, Percus J, Yamashita M and Zhao Z (2007). Large-scale semidefinite programs in electronic structure calculation, Mathematical Programming: Series A and B, 109:2-3, (553-580), Online publication date: 1-Mar-2007.
  138. Reddy R and Lastovetsky A HeteroMPI+ScaLAPACK Proceedings of the 13th international conference on High Performance Computing, (242-253)
  139. Bernabeu M and Vidal A Static versus dynamic heterogeneous parallel schemes to solve the symmetric tridiagonal eigenvalue problem Proceedings of the 6th WSEAS international conference on Applied computer science, (301-306)
  140. Bonelli A, Franchetti F, Lorenz J, Püschel M and Ueberhuber C Automatic performance optimization of the discrete fourier transform on distributed memory computers Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications, (818-832)
  141. ACM
    Langou J, Langou J, Luszczek P, Kurzak J, Buttari A and Dongarra J Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems) Proceedings of the 2006 ACM/IEEE conference on Supercomputing, (113-es)
  142. ACM
    Gygi F, Draeger E, Schulz M, de Supinski B, Gunnels J, Austel V, Sexton J, Franchetti F, Kral S, Ueberhuber C and Lorenz J Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform Proceedings of the 2006 ACM/IEEE conference on Supercomputing, (45-es)
  143. Raghunathan S (2006). Making a Supercomputer Do What You Want, Computing in Science and Engineering, 8:5, (70-80), Online publication date: 1-Sep-2006.
  144. Gradl T, Spörl A, Huckle T, Glaser S and Schulte-Herbrüggen T Parallelising matrix operations on clusters for an optimal control-based quantum compiler Proceedings of the 12th international conference on Parallel Processing, (751-762)
  145. Badía J, Benner P, Mayo R and Quintana-Ortí E Parallel solution of large-scale and sparse generalized algebraic riccati equations Proceedings of the 12th international conference on Parallel Processing, (710-719)
  146. Zhuo L and Prasanna V Scalable Hybrid Designs for Linear Algebra on Reconfigurable Computing Systems Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1, (87-95)
  147. Kajiyama T, Nukada A, Suda R, Hasegawa H and Nishida A Distributed SILC Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing, (860-870)
  148. Stpiczyński P New data distribution for solving triangular systems on distributed memory machines Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing, (589-597)
  149. Gustavson F, Karlsson L and Kågström B Three algorithms for Cholesky factorization on distributed memory using packed storage Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing, (550-559)
  150. Drummond L, Galiano V, Migallón V and Penadés J High-level user interfaces for the DOE ACTS collection Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing, (251-259)
  151. Granat R and Kågström B Parallel algorithms and condition estimators for standard and generalized triangular Sylvester-type matrix equations Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing, (127-136)
  152. Adlerborn B, Kågström B and Kressner D Parallel variants of the multishift QZ algorithm with advanced deflation techniques Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing, (117-126)
  153. Marques O and Vasconcelos P Evaluation of linear solvers for astrophysics transfer problems Proceedings of the 7th international conference on High performance computing for computational science, (466-475)
  154. Drummond L, Galiano V, Marques O, Migallón V and Penadés J PyACTS Proceedings of the 7th international conference on High performance computing for computational science, (417-425)
  155. Alonso P, Bernabeu M and Vidal A A parallel solution of hermitian toeplitz linear systems, Proceedings of the 6th international conference on Computational Science - Volume Part I, (348-355)
  156. ACM
    Krishnan M and Nieplocha J Memory efficient parallel matrix multiplication operation for irregular problems Proceedings of the 3rd conference on Computing frontiers, (229-240)
  157. Nieplocha J, Tipparaju V, Krishnan M and Panda D (2006). High Performance Remote Memory Access Communication, International Journal of High Performance Computing Applications, 20:2, (233-253), Online publication date: 1-May-2006.
  158. Allan B, Armstrong R, Bernholdt D, Bertrand F, Chiu K, Dahlgren T, Damevski K, Elwasif W, Epperly T, Govindaraju M, Katz D, Kohl J, Krishnan M, Kumfert G, Larson J, Lefantzi S, Lewis M, Malony A, Mclnnes L, Nieplocha J, Norris B, Parker S, Ray J, Shende S, Windus T and Zhou S (2006). A Component Architecture for High-Performance Scientific Computing, International Journal of High Performance Computing Applications, 20:2, (163-202), Online publication date: 1-May-2006.
  159. Bahi J, Domas S and Mazouzi K More on JACE Proceedings of the 20th international conference on Parallel and distributed processing, (231-231)
  160. Dongarra J, Bosilca G, Chen Z, Eijkhout V, Fagg G, Fuentes E, Langou J, Luszczek P, Pjesivac-Grbovic J, Seymour K, You H and Vadhiyar S (2006). Self-adapting numerical software (SANS) effort, IBM Journal of Research and Development, 50:2/3, (223-238), Online publication date: 1-Mar-2006.
  161. Polizzi E and Sameh A (2006). A parallel hybrid banded system solver, Parallel Computing, 32:2, (177-194), Online publication date: 1-Feb-2006.
  162. Nakata K, Yamashita M, Fujisawa K and Kojima M (2006). A parallel primal-dual interior-point method for semidefinite programs using positive definite matrix completion, Parallel Computing, 32:1, (24-43), Online publication date: 1-Jan-2006.
  163. Pan L, Lai M, Dillencourt M and Bic L Mobile pipelines Proceedings of the 12th international conference on High Performance Computing, (201-212)
  164. Gygi F, Yates R, Lorenz J, Draeger E, Franchetti F, Ueberhuber C, Supinski B, Kral S, Gunnels J and Sexton J Large-Scale First-Principles Molecular Dynamics simulations on the BlueGene/L Platform using the Qbox code Proceedings of the 2005 ACM/IEEE conference on Supercomputing
  165. Alonso P and Vidal A An efficient parallel solution of complex toeplitz linear systems, Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics, (486-493)
  166. Lastovetsky A and Reddy R A variable group block distribution strategy for dense factorizations on networks of heterogeneous computers Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics, (1074-1081)
  167. Kalinov A, Ledovskikh I, Posypkin M, Levchenko Z and Chizhov V A fortran evolution of mpc parallel programming language Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics, (936-943)
  168. Kalinov A, Ledovskikh I, Posypkin M, Levchenko Z and Chizhov V An implementation of the matrix multiplication algorithm SUMMA in mpf Proceedings of the 8th international conference on Parallel Computing Technologies, (420-432)
  169. Tinetti F and De Giusti A Broadcast-Based parallel LU factorization Proceedings of the 11th international Euro-Par conference on Parallel Processing, (867-876)
  170. Gansterer W and Zottl J Parallelization of divide-and-conquer eigenvector accumulation Proceedings of the 11th international Euro-Par conference on Parallel Processing, (847-856)
  171. de Carvalho Junior F and Lins R Using aspects for supporting procedural modules in # programming Proceedings of the 11th international Euro-Par conference on Parallel Processing, (730-739)
  172. Alonso P, Badía J and Vidal A (2005). An Efficient Parallel Algorithm to Solve Block-Toeplitz Systems, The Journal of Supercomputing, 32:3, (251-278), Online publication date: 1-Jun-2005.
  173. Elwasif W, Batchelor D, Bernholdt D, Berry L, D'Azevedo E, Houlberg W, Jaeger E, Kohl J and Li S Coupled fusion simulation using the common component architecture Proceedings of the 5th international conference on Computational Science - Volume Part I, (372-379)
  174. Alberti P, García V and Vidal A Parallel resolution with newton algorithms of the inverse non-symmetric eigenvalue problem Proceedings of the 5th international conference on Computational Science - Volume Part I, (229-236)
  175. Alonso P and Vidal A The symmetric–toeplitz linear system problem in parallel Proceedings of the 5th international conference on Computational Science - Volume Part I, (220-228)
  176. Shah V and Gilbert J Sparse matrices in Matlab*P Proceedings of the 11th international conference on High Performance Computing, (144-155)
  177. Becerra G and Maciá A Parallel global and local convergent algorithms for solving the iniverse additive singular value problem Proceedings of the 4th WSEAS International Conference on Systems Theory and Scientific Computation, (1-6)
  178. ACM
    Matthey T, Cickovski T, Hampton S, Ko A, Ma Q, Nyerges M, Raeder T, Slabach T and Izaguirre J (2004). ProtoMol, an object-oriented framework for prototyping novel algorithms for molecular dynamics, ACM Transactions on Mathematical Software, 30:3, (237-265), Online publication date: 1-Sep-2004.
  179. Gansterer W, Bai Y, Day R and Ward R (2004). A Framework for Approximating Eigenpairs in Electronic Structure Computations, Computing in Science and Engineering, 6:5, (50-59), Online publication date: 1-Sep-2004.
  180. Whiley M and Wilson S (2004). Parallel algorithms for Markov chain Monte Carlo methods in latent spatial Gaussian models, Statistics and Computing, 14:3, (171-179), Online publication date: 1-Aug-2004.
  181. Kalinov A and Ledovskikh I (2004). An Extension of Fortran for High Performance Parallel Computing, Programming and Computing Software, 30:4, (209-217), Online publication date: 1-Jul-2004.
  182. Alonso P, Badía J and Vidal A An efficient and stable parallel solution for non-symmetric toeplitz linear systems Proceedings of the 6th international conference on High Performance Computing for Computational Science, (685-698)
  183. Arias E and Hernández V Numerical integration of the differential riccati equation Proceedings of the 6th international conference on High Performance Computing for Computational Science, (671-684)
  184. Peinado J and Vidal A Three parallel algorithms for solving nonlinear systems and optimization problems Proceedings of the 6th international conference on High Performance Computing for Computational Science, (657-670)
  185. Drummond L, Hernandez V, Marques O, Roman J and Vidal V A survey of high-quality computational libraries and their impact in science and engineering applications Proceedings of the 6th international conference on High Performance Computing for Computational Science, (37-50)
  186. Benner P, Quintana-Ortí E and Quintana-Ortí G Parallel model reduction of large linear descriptor systems via balanced truncation Proceedings of the 6th international conference on High Performance Computing for Computational Science, (340-353)
  187. Granat R and Kågström B Evaluating parallel algorithms for solving sylvester-type matrix equations Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing, (719-729)
  188. Joffrain T, Quintana-Ortí E and van de Geijn R Rapid development of high-performance out-of-core solvers Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing, (413-422)
  189. Radons G, Rünger G, Schwind M and Yang H Parallel algorithms for the determination of lyapunov characteristics of large nonlinear dynamical systems Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing, (1131-1140)
  190. Vasconcelos P and d'Almeida F Performance evaluation of a parallel algorithm for a radiative transfer problem Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing, (864-871)
  191. Cuenca J, Giménez D and González J (2004). Architecture of an automatically tuned linear algebra library, Parallel Computing, 30:2, (187-210), Online publication date: 1-Feb-2004.
  192. References Grid resource management, (507-566)
  193. Chen Z, Dongarra J, Luszczek P and Roche K (2003). Self-adapting software for numerical linear algebra and LAPACK for clusters, Parallel Computing, 29:11-12, (1723-1743), Online publication date: 1-Nov-2003.
  194. Benner P, Quintana-Ortí E and Quintana-Ortí G (2003). State-space truncation methods for parallel model reduction of large-scale systems, Parallel Computing, 29:11-12, (1701-1722), Online publication date: 1-Nov-2003.
  195. ACM
    Frens J and Wise D (2003). QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism, ACM SIGPLAN Notices, 38:10, (144-154), Online publication date: 1-Oct-2003.
  196. Yamamoto Y, Igai M and Naono K A vector-parallel FFT with a user-specifiable data distribution scheme Proceedings of the 2003 international conference on Parallel and distributed processing and applications, (362-374)
  197. ACM
    Frens J and Wise D QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, (144-154)
  198. Garcke J, Hegland M and Nielsen O Parallelisation of sparse grids for large scale data analysis Proceedings of the 2003 international conference on Computational science: PartIII, (683-692)
  199. Teranishi K, Raghavan P and Yang C Time-memory trade-offs using sparse matrix methods for large-scale eigenvalue problems Proceedings of the 2003 international conference on Computational science and its applications: PartI, (840-847)
  200. Addison_c C, Ren Y and van Waveren M (2003). OpenMP issues arising in the development of parallel BLAS and LAPACK libraries, Scientific Programming, 11:2, (95-104), Online publication date: 1-Apr-2003.
  201. D'Apuzzo M and Marino M (2003). Parallel computational issues of an interior point method for solving large bound-constrained quadratic programming problems, Parallel Computing, 29:4, (467-483), Online publication date: 1-Apr-2003.
  202. Migdalas A, Toraldo G and Kumar V (2003). Nonlinear optimization and parallel computing, Parallel Computing, 29:4, (375-391), Online publication date: 1-Apr-2003.
  203. Lawson C and Hanson R Least-squares approximation Encyclopedia of Computer Science, (963-964)
  204. Worley P, Dunigan T, Fahey M, White J and Bland A Early evaluation of the IBM p690 Proceedings of the 2002 ACM/IEEE conference on Supercomputing, (1-21)
  205. Peinado J and Vidal A A parallel Newton-GMRES algorithm for solving large scale nonlinear systems Proceedings of the 5th international conference on High performance computing for computational science, (328-342)
  206. Knoop J and Mehofer E (2002). Distribution Assignment Placement, IEEE Transactions on Parallel and Distributed Systems, 13:6, (628-647), Online publication date: 1-Jun-2002.
  207. Caron E and Utard G Parallel Out-of-Core Matrix Inversion Proceedings of the 16th International Parallel and Distributed Processing Symposium
  208. Beaumont O, Legrand A, Rastello F and Robert Y (2002). Dense linear algebra kernels on heterogeneous platforms, Parallel Computing, 28:2, (155-185), Online publication date: 1-Feb-2002.
  209. Benner P, Byers R, Mayo R, Quintana-Ortí E and Hernández V (2002). Parallel Algorithms for LQ Optimal Control of Discrete-Time Periodic Linear Systems, Journal of Parallel and Distributed Computing, 62:2, (306-325), Online publication date: 1-Feb-2002.
  210. Cuenca J, Giménez D and González J Towards the design of an automatically tuned linear algebra library Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing, (201-208)
  211. Ruiz J, Lopera J and Carrillo J Exploiting the multilevel parallelism and the problem structure in the numerical solution of stiff ODEs Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing, (173-180)
  212. Trefethen A and Ford B Numerical algorithm delivery mechanisms Computational science, mathematics and software, (27-42)
  213. Boisvert R Mathematical software Computational science, mathematics and software, (3-26)
  214. Bader D, Moret B and Sanders P Algorithm engineering for parallel computation Experimental algorithmics, (1-23)
  215. ACM
    Tatebe O, Nagashima U, Sekiguchi S, Kitabayashi H and Hayashida Y Design and implementation of FMPL, a fast message-passing library for remote memory operations Proceedings of the 2001 ACM/IEEE conference on Supercomputing, (15-15)
  216. ACM
    Petitet A, Blackford S, Dongarra J, Ellis B, Fagg G, Roche K and Vadhiyar S Numerical libraries and the grid Proceedings of the 2001 ACM/IEEE conference on Supercomputing, (14-14)
  217. Beaumont O, Boudet V, Rastello F and Robert Y (2001). Matrix Multiplication on Heterogeneous Platforms, IEEE Transactions on Parallel and Distributed Systems, 12:10, (1033-1051), Online publication date: 1-Oct-2001.
  218. Beaumont O, Boudet V and Petitet A (2001). A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers), IEEE Transactions on Computers, 50:10, (1052-1070), Online publication date: 1-Oct-2001.
  219. Carpenter B, Fox G, Lee H and Lim S Translation schemes for the HP java parallel programming language Proceedings of the 14th international conference on Languages and compilers for parallel computing, (18-32)
  220. Hauser T, Mattox T, LeBeau R, Dietz H and Huang P High-cost CFD on a low-cost cluster Proceedings of the 2000 ACM/IEEE conference on Supercomputing, (55-es)
  221. Beaumont O, Boudet V, Rastello F and Robert Y Matrix-Matrix Multiplication on Heterogeneous Platforms Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
  222. Kontoghiorghes E (2000). Parallel Strategies for Solving SURE Models with Variance Inequalities and Positivity of Correlations Constraints, Computational Economics, 15:1-2, (89-106), Online publication date: 1-Apr-2000.
  223. Benner P, Castillo M, Quintana-Ortí E and Hernández V (2000). Parallel Partial Stabilizing Algorithms for Large Linear Control Systems, The Journal of Supercomputing, 15:2, (193-206), Online publication date: 1-Feb-2000.
  224. Park N, Prasanna V and Raghavendra C (1999). Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets, IEEE Transactions on Parallel and Distributed Systems, 10:12, (1217-1240), Online publication date: 1-Dec-1999.
  225. Petitet A and Dongarra J (1999). Algorithmic Redistribution Methods for Block-Cyclic Decompositions, IEEE Transactions on Parallel and Distributed Systems, 10:12, (1201-1216), Online publication date: 1-Dec-1999.
  226. Sears M, Stanley K and Henry G Application of a high performance parallel eigensolver to electronic structure calculations Proceedings of the 1998 ACM/IEEE conference on Supercomputing, (1-1)
  227. Whaley R and Dongarra J Automatically tuned linear algebra software Proceedings of the 1998 ACM/IEEE conference on Supercomputing, (1-27)
  228. Li X and Demmel J Making sparse Gaussian elimination scalable by static pivoting Proceedings of the 1998 ACM/IEEE conference on Supercomputing, (1-17)
  229. ACM
    Saltz J, Sussman A, Graham S, Demmel J, Baden S and Dongarra J (1998). Programming tools and environments, Communications of the ACM, 41:11, (64-73), Online publication date: 1-Nov-1998.
  230. Casanova H and Dongarra J (1998). Applying NetSolve's Network-Enabled Server, IEEE Computational Science & Engineering, 5:3, (57-67), Online publication date: 1-Jul-1998.
  231. Casanova H and Dongarra J NetSolve Proceedings of the Seventh Heterogeneous Computing Workshop
  232. ACM
    Kennedy K, Bender C, Connolly J, Hennessy J, Vernon M and Smarr L (1997). A nationwide parallel computing environment, Communications of the ACM, 40:11, (62-72), Online publication date: 1-Nov-1997.
Contributors
  • The University of Tennessee System
  • Soongsil University
  • University of Portland
  • Lawrence Livermore National Laboratory
  • University of California, Berkeley
  • The University of Texas at Austin
  • The University of Manchester
  • Intel Corporation
  • Sun Microsystems
  • Oberlin College and Conservatory
  • Cardiff University
  • Florida State University
  • The University of Tennessee, Knoxville

Recommendations

Reviews

Charles Raymond Crawford

ScaLAPACK is a library of routines for solving linear algebra problems on multiprocessor systems with distributed memory. It is designed to be easily portable, and has been implemented on message-passing systems, including PVM, MPI, the Intel series NX, the IBM SP series, the Thinking Machines CM-5, and the Cray T3 series. ScaLAPACK is an extension of LAPACK in which the algorithms are based on block partitions of the associated matrices so that the computation can be done using vector-matrix or matrix-matrix operations. In ScaLAPACK the latter operations are done using modified versions of the Basic Linear Algebra routines (level 3), which the authors refer to as Parallel Basic Linear Algebra Subprograms (PBLAS). The PBLAS, in turn, use system-dependent routines called the Basic Linear Algebra Communication Subroutines (BLACS) for communication tasks. The BLACS are the only part of the package that must be system-dependent, although performance can be enhanced with platform-specific implementations of the PBLAS. The PBLAS and BLACS routines are written in C, while the remainder of the package is in Fortran 77. ScaLAPACK is available on the Internet or can be purchased on a CD-ROM. These files contain all the source as well as prebuilt versions of the BLACS for various platforms. This user's guide provides an overview of the package. The first three chapters present details of the algorithms used for each of the problems. Chapter 4 explains concepts particular to the multiprocessor environment: process grids, contexts, scoped operations, and data descriptors. Details are provided for in-core dense and banded matrices as well as for out-of-core dense matrices. The final chapters deal with the performance, accuracy, and stability of the package and provide advice on troubleshooting. There is a complete bibliography as well as a keyword index.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.