skip to main content
Optimal Broadcasting in Mesh-Connected ArchitecturesDecember 1991
1991 Technical Report
Publisher:
  • University of Texas at Austin
  • Computer Science Dept. Taylor Hall 2.124 Austin, TX
  • United States
Published:01 December 1991
Bibliometrics
Skip Abstract Section
Abstract

In this paper, we disprove the common assumption that the time for broadcasting in a mesh is at best proportional to the square root of the number of processors, at least in the presence of worm-hole routing. We present an optimal algorithm for broadcasting in mesh-connected distributed-memory architectures with worm-hole routing. By organizing the processing nodes in a logical spanning tree, the algorithm executes in time proportional to the logarithm of the number of nodes without inducing contention in the communication network. We restrict the number of nodes in each dimension of the processor mesh to be a power of two. Our method provides insight into how to avoid and/or reduce network contention on meshes for other communication operations. Experimental results on the Intel Touchstone Delta system are included.

Cited By

  1. Hoefler T and Moor D (2014). Energy, Memory, and Runtime Tradeoffs for Implementing Collective Communication Operations, Supercomputing Frontiers and Innovations: an International Journal, 1:2, (58-75), Online publication date: 9-Jul-2014.
  2. ACM
    Sack P and Gropp W Faster topology-aware collective algorithms through non-minimal communication Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, (45-54)
  3. ACM
    Sack P and Gropp W (2012). Faster topology-aware collective algorithms through non-minimal communication, ACM SIGPLAN Notices, 47:8, (45-54), Online publication date: 11-Sep-2012.
  4. ACM
    Chan E, van de Geijn R, Gropp W and Thakur R Collective communication on architectures that support simultaneous communication over multiple links Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, (2-11)
  5. Sivaram R, Stunkel C and Panda D (2000). Implementing Multidestination Worms in Switch-Based Parallel Systems, IEEE Transactions on Parallel and Distributed Systems, 11:8, (794-812), Online publication date: 1-Aug-2000.
  6. Sivaram R, Panda D and Stunkel C (1998). Efficient Broadcast and Multicast on Multistage Interconnection Networks Using Multiport Encoding, IEEE Transactions on Parallel and Distributed Systems, 9:10, (1004-1028), Online publication date: 1-Oct-1998.
  7. ACM
    Stunkel C, Sivaram R and Panda D Implementing multidestination worms in switch-based parallel systems Proceedings of the 24th annual international symposium on Computer architecture, (50-61)
  8. ACM
    Stunkel C, Sivaram R and Panda D (1997). Implementing multidestination worms in switch-based parallel systems, ACM SIGARCH Computer Architecture News, 25:2, (50-61), Online publication date: 1-May-1997.
  9. Barnett M, Payne D, van de Geijn R and Watts J (2019). Broadcasting on Meshes with Wormhole Routing, Journal of Parallel and Distributed Computing, 35:2, (111-122), Online publication date: 15-Jun-1996.
  10. Peters J and Syska M (1996). Circuit-Switched Broadcasting in Torus Networks, IEEE Transactions on Parallel and Distributed Systems, 7:3, (246-255), Online publication date: 1-Mar-1996.
  11. Ranka S, Wang J and Fox G (2019). Static and Run-Time Algorithms for All-to-Many Personalized Communication on Permutation Networks, IEEE Transactions on Parallel and Distributed Systems, 5:12, (1266-1274), Online publication date: 1-Dec-1994.
  12. ACM
    Tsai Y and McKinley P A dominating set model for broadcast in all-port wormhole-routed 2D mesh networks Proceedings of the 8th international conference on Supercomputing, (126-135)
  13. Barnett M, Gupta S, Payne D, Shuler L, van de Geijn R and Watts J Building a high-performance collective communication library Proceedings of the 1994 ACM/IEEE conference on Supercomputing, (107-116)
  14. Kee K and Hariri S Efficient communication algorithms for pipeline multicomputers Proceedings of the 1994 ACM/IEEE conference on Supercomputing, (468-477)
  15. ACM
    Lewis J and van de Geijn R Distributed memory matrix-vector multiplication and conjugate gradient algorithms Proceedings of the 1993 ACM/IEEE conference on Supercomputing, (484-492)
Contributors
  • Microsoft Research
  • Intel Corporation
  • The University of Texas at Austin

Recommendations