ABSTRACT
End-point hotspots can cause major slowdowns in interconnection networks due to head-of-line blocking and congestion. Therefore, avoiding congestion is important to ensure high performance for the network traffic. It is especially important in situations where permanent congestion, which results in permanent slowdown, can occur. Permanent congestion occurs when traffic has been moved away from a failed link, when multiple jobs run on the same system, and compete for network resources, or when a system is not balanced for the application that runs on it.
In this paper we suggest a mechanism for dynamic allocation of virtual lanes and live optimization of the distribution of flows between the allocated virtual lanes. The purpose is to alleviate the negative effect of permanent congestion by separating network flows into slow lane and fast lane traffic. Flows destined for a end-point hot-spot is placed in the slow lane and all other flows are placed in the fast lane. Consequently, the flows in the fast lane are unaffected by the head-of-line blocking created by the hot-spot traffic.
We demonstrate the feasibility of this approach using a modified version of OFED and OpenSM with fat-tree routing on a small InfiniBand cluster. Our experiments show an increase in throughput ranging from 150% to 468% compared to the conventional fat-tree algorithm in OFED.
- HPC Challenge Benchmark. http://icl.cs.utk.edu/hpcc/.Google Scholar
- The OpenFabrics Alliance. http://openfabrics.org/, Sept. 2010.Google Scholar
- Top 500 supercomputer sites. http://www.top500.org/, Nov. 2010.Google Scholar
- B. Bogdanski et al. Achieving Predictable High Performance in Imbalanced Fat Trees. In Proceedings of the 16th International Conference on Parallel and Distributed Systems (ICPADS'10) - to appear, 2010. Google ScholarDigital Library
- W. J. Dally and B. Towles. Principles and practices of interconnection networks, chapter 15.4.1, pages 294--295. Morgan Kaufmann, 2004.Google Scholar
- J. Escudero-Sahuquillo et al. An Efficient Strategy for Reducing Head-of-Line Blocking in Fat-Trees. In D'Ambra, Pasqua And Guarracino, Mario And Talia, Domenico, editor, Lecture Notes in Computer Science, volume 6272, pages 413--427. Springer Berlin / Heidelberg, 2010. Google ScholarDigital Library
- C. Gómez et al. Deterministic versus Adaptive Routing in Fat-Trees. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium. IEEE CS, 2007.Google ScholarCross Ref
- E. G. Gran et al. First Experiences with Congestion Control in InfiniBand Hardware. In Proceeding of the 24th IEEE International Parallel & Distributed Processing Symposium, 2010.Google ScholarCross Ref
- E. G. Gran and S.-A. Reinemo. Infiniband congestion control, modelling and validation. In 4th International ICST Conference on Simulation Tools and Techniques (SIMUTools2011, OMNeT ++ 2011 Workshop), 2011. Google ScholarDigital Library
- W. L. Guay, B. Bogdanski, S.-A. Reinemo, O. Lysne, and T. Skeie. vftree - a fat-tree routing algorithm using virtual lanes to alleviate congestion. In Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium, 2011. Google ScholarDigital Library
- W. L. Guay and S.-A. Reinemo. A scalable method for signalling dynamic reconfiguration events with opensm. In R. Buyya, editor, 11th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2011), pages 332 -- 341. IEEE Computer Society Press, 2011. Google ScholarDigital Library
- W. L. Guay, S.-A. Reinemo, O. Lysne, T. Skeie, B. D. Johnsen, and L. Holen. Host side dynamic reconfiguration with infiniband. In IEEE International Conference on Cluster Computing, pages 126--135, 2010. Google ScholarDigital Library
- T. Hoefler et al. Multistage switches are not crossbars: Effects of static routing in high-performance networks. In Cluster Computing, 2008 IEEE International Conference on, pages 116--125, 2008.Google ScholarCross Ref
- InfiniBand Trade Association. InfiniBand architecture specification, 1.2.1 edition, November 2007.Google Scholar
- G. Pfister et al. Solving Hot Spot Contention Using InfiniBand Architecture Congestion Control, July 2005.Google Scholar
- G. F. Pfister and A. Norton. "Hot Spot" Contention and Combining in Multistage Interconnection Networks. IEEE Transactions on Computers, C-34(10):943--948, 1985.Google Scholar
- G. Rodriguez et al. Exploring pattern-aware routing in generalized fat tree networks. In Proceedings of the 23rd international conference on Supercomputing, pages 276--285, New York, 2009. ACM. Google ScholarDigital Library
- G. Rodriguez et al. Oblivious Routing Schemes in Extended Generalized Fat Tree Networks. IEEE International Conference on Cluster Computing and Workshops, 2009. CLUSTER '09., pages 1--8, 2009.Google Scholar
- A. Vishnu, M. Koop, and A. Moody. Topology agnostic hot-spot avoidance with InfiniBand. Concurrency and Computation: Practice and Experience, 21(3):301--319, 2009. Google ScholarDigital Library
- E. Zahavi et al. Optimized InfiniBand TM fat-tree routing for shift all-to-all communication patterns. Concurrency and Computation: Practice and Experience, 22(2):217--231, 2009. Google ScholarDigital Library
Index Terms
- dFtree: a fat-tree routing algorithm using dynamic allocation of virtual lanes to alleviate congestion in infiniband networks
Recommendations
Hardware supported multicast in fat-tree-based InfiniBand networks
AbstractThe multicast operation is a very commonly used operation in parallel applications. It can be used to implement many collective communication operations as well. Therefore, its performance will affect parallel applications and collective ...
Nomad: migrating OS-bypass networks in virtual machines
VEE '07: Proceedings of the 3rd international conference on Virtual execution environmentsVirtual machine (VM) technology is experiencing a resurgence due to various benefits including ease of management, security and resource consolidation. Live migration of virtual machines allows transparent movement of OS instances and hosted ...
Discovery and Routing of Degraded Fat-Trees
PDCAT '12: Proceedings of the 2012 13th International Conference on Parallel and Distributed Computing, Applications and TechnologiesThe fat-tree topology has become a popular choice for InfiniBand enterprise systems due to its deadlock freedom, fault-tolerance and full bisection bandwidth. In the HPC domain, InfiniBand fabric is used in almost 42% of the systems on the latest Top ...
Comments