We consider FFT computations on parallel machines in which communication speed is limited by network bandwidth rather than by message overhead. Such machines are modeled well by a special case of the LogP model of Culler et al. We show how commonly used parallel programming techniques such as intelligent data placement, careful co-ordination of computation and communication, and pipelining long streams of messages, can yield 100% efficiency (in speedup) even in the presence of substantial message traffic. At a different level, our parallel algorithm can be viewed as an efficient simulation of a butterfly network on the LogP model in the presence of reasonable slack.
Cited By
- Culler D, Karp R, Patterson D, Sahay A, Schauser K, Santos E, Subramonian R and von Eicken T LogP: towards a realistic model of parallel computation Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, (1-12)
- Culler D, Karp R, Patterson D, Sahay A, Schauser K, Santos E, Subramonian R and von Eicken T (2019). LogP: towards a realistic model of parallel computation, ACM SIGPLAN Notices, 28:7, (1-12), Online publication date: 1-Jul-1993.
Recommendations
Optimizing bandwidth limited problems using one-sided communication and overlap
IPDPS'06: Proceedings of the 20th international conference on Parallel and distributed processingThis paper demonstrates the one-sided communication used in languages like UPC can provide a significant performance advantage for bandwidth-limited applications. This is shown through communication microbenchmarks and a case-study of UPC and MPI ...
High-Performance Radix-2, 3 and 5 Parallel 1-D Complex FFT Algorithms for Distributed-Memory Parallel Computers
In this paper, we propose high-performance radix-2, 3 and 5 parallel 1-D complex FFT algorithms for distributed-memory parallel computers. We use the four-step or six-step FFT algorithms to implement the radix-2, 3 and 5 parallel 1-D complex FFT ...