ABSTRACT
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. it is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM-5.
- 1.A. Aggarwal, A. K. Chandra, and M. Snir. On Communication Latency in PRAM Computation. in Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 11-21. ACM, June 1989. Google ScholarDigital Library
- 2.A. Aggarwal, A. K. Chandra, and M. Snir. Communication Complexity of PRAMs. In Theoretical Computer Science, pages 3-28, March 1990. Google ScholarDigital Library
- 3.B. Alpem, L. Carter, E. Feig, and T. Selker. The Uniform Memory Hierarchy Model of Computation. Algorithmica, 1993. (to appear).Google Scholar
- 4.A. Bar-Noy and S. Kipnis. Designing broadcasting algorithms in the postal model for message-passing systems. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 11-22, June 1992. Google ScholarDigital Library
- 5.G. Bell. Ultracomputers: A Teraflop Before Its Time. Communications of the Association for Computing Machinery, 35(8):26-47, August 1992. Google ScholarDigital Library
- 6.G.E. Blelloch. Scans as Primitive Parallel Operations. In Proceedings of lnternational Conference on Parallel Processing, pages 355-362, 1987.Google Scholar
- 7.R. Cole and O. Zajicek. The APRAM: Incorporating asynchrony into the PRAM model. In Proceedings of the Symposium on Parallel Architectures and Algorithms, pages 169- 178, 1989. Google ScholarDigital Library
- 8.J.M. Cooley and J. W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Comp, 19:297- 301, 1965.Google ScholarDigital Library
- 9.D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a Realistic Model of Parallel Computation. Technical Report UCB/CSD 92/713, UC Berkeley, 1992. Google ScholarDigital Library
- 10.W. et al Dally. The J-Machine: A Fine-Grain Concurrent Computer. In IFIP Congress, 1989.Google Scholar
- 11.W. J. Dally. Performance Analysis of k-ary n-cube Intercormection Networks. IEEE Transaction on Computers, 39(6):775-785, June 1990. Google ScholarDigital Library
- 12.Lenoski D. et al. The Stanford Dash Multiprocessor. IEEE Computer, 25(3):63-79, March 1992. Google ScholarDigital Library
- 13.S. Fortune and J. Wyllie. Parallelism in Random Access Machines. In Proceedings of the l Oth Annual Symposium on Theory of Computing, pages 114-118, 1978. Google ScholarDigital Library
- 14.P. B. Gibbons. A More Practical PRAM Model. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 158-168. ACM, 1989. Google ScholarDigital Library
- 15.J. L. Hennessy. MIPS R4000 Overview. In Proc. of the 19th lnt'l Symposium on Computer Architecture, Gold Coast, Australia, May 1992.Google Scholar
- 16.J.L. Hennessy and D. A. Patterson. Computer Architecture-- A Quantitative Approach. Morgan Kaufmann, 1990. Google ScholarDigital Library
- 17.IEEE. Symposium Record-- Hot Chips IV, August 1992.Google Scholar
- 18.P. Kanellakis and A. Shvartsman. Efficient parallel algorithms can be made robust. In Proceedings of the 8th Symposium on Principles of Distributed Computing, pages 211-221,1989. Google ScholarDigital Library
- 19.R. M. Karp, M. Luby, and E Meyer auf der Heide. Efficient PRAM Simulation on a Distributed Memory Machine. In Proceedings of the Twenty-Fourth Annual ACM Symposium of the Theory of Computing, pages 318-326, May 1992. Google ScholarDigital Library
- 20.R.M. Karp, A. Sahay, E. Santos, and K. E. Schauser. Optimal Broadcast and Summation in the LogP Model. Technical Report UCB/CSD 92/721, UC Berkeley, 1992. Google ScholarDigital Library
- 21.Z.M. Kedem, K. V. Palem, and P. G. Spirakis. Efficient robust parallel computations. In Proceedings of the 22nd Annual Symposium on Theory of Computing, pages 138-148, 1990. Google ScholarDigital Library
- 22.C.U.Martel andA. Raghunathan.Asynchronous PRAMs with memory latency. Technical report, University of California, Davis, Division of Computer Science, 1991.Google Scholar
- 23.K. Mehlhorn and U. Vishkin. Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memories. Acta Informatica, 21:339- 374, 1984. Google ScholarDigital Library
- 24.C. H. Papadimitriou and M. Yannakakis. Towards an Architecture-Independent Analysis of Parallel Algorithms. In Proceedings of the Twentieth Annual ACM Symposium of the Theory of Computing, pages 510-513. ACM, 1988. Google ScholarDigital Library
- 25.G. M. Papadopoulos and D. E. Culler. Monsoon: an Explicit Token-Store Architecture. In Proc. of the 17th Annual Int. Syrup. on Comp. Arch., Seattle, Washington, May 1990. Google ScholarDigital Library
- 26.A.G. Ranade. How to emulate shared memory.in Proceedings of the 28th IEEE Annual Symposium on Foundations of Computer Science, pages 185-194, 1987.Google ScholarDigital Library
- 27.A. Sahay. Hiding Communication Costs in Bandwidth- Limited Parallel FFT Computation. Technical Report UCB/CSD 92/722, UC Berkeley, 1992. Google ScholarDigital Library
- 28.L. Snyder. Type Architectures, Shared Memory, and the Corollary of Modest Potential. In Ann. Rev. Comput. Sci., pages 289-317. Annual Reviews Inc., 1986. Google ScholarDigital Library
- 29.R. Subramonian. The influence of limited bandwidth on algorithm design and implementation. In Dartmouth Institute for Advanced Graduate Studies in Parallel Computation (DAGS/PC), June 1992.Google Scholar
- 30.L. G. Valiant. A Bridging Model for Parallel Computation. Communications of the Association for Computing Machinery, 33(8):103-11, August 1990. Google ScholarDigital Library
- 31.T. yon Eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser. Active Messages: a Mechanism for Integrated Communication and Computation. in Proc. of the 19th Int'l Symposium on Computer Architecture, Gold Coast, Australia, May 1992. Google ScholarDigital Library
Index Terms
- LogP: towards a realistic model of parallel computation
Recommendations
LogP: towards a realistic model of parallel computation
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of ...
MPPs and clusters for scalable computing
ISPAN '96: Proceedings of the 1996 International Symposium on Parallel Architectures, Algorithms and NetworksThis article assess the state-of-the-art technology in massively parallel processors (MPPs) and clusters of workstations (COWs) for scalable parallel computing. We evaluate the IBM SP2, the Intel Paragon, the Cray T3D/T3E, and the ASCI TeraFLOPS system ...
Comments