In this paper, we disprove the common assumption that the time for broadcasting in a mesh is at best proportional to the square root of the number of processors, at least in the presence of worm-hole routing. We present an optimal algorithm for broadcasting in mesh-connected distributed-memory architectures with worm-hole routing. By organizing the processing nodes in a logical spanning tree, the algorithm executes in time proportional to the logarithm of the number of nodes without inducing contention in the communication network. We restrict the number of nodes in each dimension of the processor mesh to be a power of two. Our method provides insight into how to avoid and/or reduce network contention on meshes for other communication operations. Experimental results on the Intel Touchstone Delta system are included.
Cited By
- Hoefler T and Moor D (2014). Energy, Memory, and Runtime Tradeoffs for Implementing Collective Communication Operations, Supercomputing Frontiers and Innovations: an International Journal, 1:2, (58-75), Online publication date: 9-Jul-2014.
- Sack P and Gropp W Faster topology-aware collective algorithms through non-minimal communication Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, (45-54)
- Sack P and Gropp W (2012). Faster topology-aware collective algorithms through non-minimal communication, ACM SIGPLAN Notices, 47:8, (45-54), Online publication date: 11-Sep-2012.
- Chan E, van de Geijn R, Gropp W and Thakur R Collective communication on architectures that support simultaneous communication over multiple links Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, (2-11)
- Sivaram R, Stunkel C and Panda D (2000). Implementing Multidestination Worms in Switch-Based Parallel Systems, IEEE Transactions on Parallel and Distributed Systems, 11:8, (794-812), Online publication date: 1-Aug-2000.
- Sivaram R, Panda D and Stunkel C (1998). Efficient Broadcast and Multicast on Multistage Interconnection Networks Using Multiport Encoding, IEEE Transactions on Parallel and Distributed Systems, 9:10, (1004-1028), Online publication date: 1-Oct-1998.
- Stunkel C, Sivaram R and Panda D Implementing multidestination worms in switch-based parallel systems Proceedings of the 24th annual international symposium on Computer architecture, (50-61)
- Stunkel C, Sivaram R and Panda D (1997). Implementing multidestination worms in switch-based parallel systems, ACM SIGARCH Computer Architecture News, 25:2, (50-61), Online publication date: 1-May-1997.
- Barnett M, Payne D, van de Geijn R and Watts J (2019). Broadcasting on Meshes with Wormhole Routing, Journal of Parallel and Distributed Computing, 35:2, (111-122), Online publication date: 15-Jun-1996.
- Peters J and Syska M (1996). Circuit-Switched Broadcasting in Torus Networks, IEEE Transactions on Parallel and Distributed Systems, 7:3, (246-255), Online publication date: 1-Mar-1996.
- Ranka S, Wang J and Fox G (2019). Static and Run-Time Algorithms for All-to-Many Personalized Communication on Permutation Networks, IEEE Transactions on Parallel and Distributed Systems, 5:12, (1266-1274), Online publication date: 1-Dec-1994.
- Tsai Y and McKinley P A dominating set model for broadcast in all-port wormhole-routed 2D mesh networks Proceedings of the 8th international conference on Supercomputing, (126-135)
- Barnett M, Gupta S, Payne D, Shuler L, van de Geijn R and Watts J Building a high-performance collective communication library Proceedings of the 1994 ACM/IEEE conference on Supercomputing, (107-116)
- Kee K and Hariri S Efficient communication algorithms for pipeline multicomputers Proceedings of the 1994 ACM/IEEE conference on Supercomputing, (468-477)
- Lewis J and van de Geijn R Distributed memory matrix-vector multiplication and conjugate gradient algorithms Proceedings of the 1993 ACM/IEEE conference on Supercomputing, (484-492)
Recommendations
Broadcasting and routing in faulty mesh networks
IPDPS'06: Proceedings of the 20th international conference on Parallel and distributed processingBroadcasting is a data communication task in which one processor sends the same message to all other processors. Routing is a task where a source processor sends a message to a destination processor. A faulty node is in an error state and cannot ...
An Optimal Shortest-Path Routing Policy for Network Computers with Regular Mesh-Connected Topologies
A probabilistic routing policy, the Z/sup 2/ (zigzag) routing policy, is presented within the class of nonadaptive, shortest-path routing policies for regular mesh-connected topologies such as n-dimensional toroids and hypercubes. The focus of the ...
Fault-Tolerant Routing in Mesh Architectures
It is important for a distributed computing system to be able to route messages aroundwhatever faulty links or nodes may be present. We present a fault-tolerant routingalgorithm that assures the delivery of every message as long as there is a path ...