ABSTRACT
We present Sincronia, a near-optimal network design for coflows that can be implemented on top on any transport layer (for flows) that supports priority scheduling. Sincronia achieves this using a key technical result --- we show that given a "right" ordering of coflows, any per-flow rate allocation mechanism achieves average coflow completion time within 4X of the optimal as long as (co)flows are prioritized with respect to the ordering.
Sincronia uses a simple greedy mechanism to periodically order all unfinished coflows; each host sets priorities for its flows using corresponding coflow order and offloads the flow scheduling and rate allocation to the underlying priority-enabled transport layer. We evaluate Sincronia over a real testbed comprising 16-servers and commodity switches, and using simulations across a variety of workloads. Evaluation results suggest that Sincronia not only admits a practical, near-optimal design but also improves upon state-of-the-art network designs for coflows (sometimes by as much as 8X).
- 2018. Sincronia Repository. https://github.com/sincronia-coflow.Google Scholar
- Saksham Agarwal, Shijin Rajakrishnan, Akshay Narayan, Rachit Agarwal, David Shmoys, and Amin Vahdat. 2018. Sincronia: Near-Optimal Network Design for Coflows. In Tech Report.Google ScholarDigital Library
- Saba Ahmadi, Samir Khuller, Manish Purohit, and Sheng Yang. 2017. On scheduling coflows. In MOS IPCO.Google Scholar
- Mohammad Alizadeh, Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker. 2013. pFabric: Minimal Near-optimal Datacenter Transport. In ACM SIGCOMM. Google ScholarDigital Library
- Nikhil Bansal and Subhash Khot. 2010. Inapproximability of hyper-graph vertex cover and applications to scheduling problems. In EATCS ICALP. Google ScholarDigital Library
- Kwok Ho Chan, Jozef Babiarz, and Fred Baker. 2006. Configuration Guidelines for DiffServ Service Classes. https://tools/ietf.org/html/rfc4594.Google Scholar
- Mosharaf Chowdhury and Ion Stoica. 2012. Coflow: A networking abstraction for cluster applications. In ACM HotNets. Google ScholarDigital Library
- Mosharaf Chowdhury and Ion Stoica. 2015. Efficient coflow scheduling without prior knowledge. In ACM SIGCOMM. Google ScholarDigital Library
- Mosharaf Chowdhury, Matei Zaharia, Justin Ma, Michael I Jordan, and Ion Stoica. 2011. Managing data transfers in computer clusters with orchestra. In ACM SIGCOMM. Google ScholarDigital Library
- Mosharaf Chowdhury, Yuan Zhong, and Ion Stoica. 2014. Efficient coflow scheduling with varys. In ACM SIGCOMM. Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: simplified data processing on large clusters. In USENIX OSDI. Google ScholarDigital Library
- Fahad R Dogar, Thomas Karagiannis, Hitesh Ballani, and Antony Rowstron. 2014. Decentralized task-aware scheduling for data center networks. In ACM SIGCOMM. Google ScholarDigital Library
- Peter X Gao, Akshay Narayan, Gautam Kumar, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. 2015. phost: Distributed near-optimal datacenter transport over commodity network fabric. In ACM CoNEXT. Google ScholarDigital Library
- Naveen Garg, Amit Kumar, and Vinayaka Pandit. 2007. Order scheduling models: hardness and algorithms. In IARCS FSTTCS. Google ScholarDigital Library
- Mark Handley, Costin Raiciu, Alexandru Agache, Andrei Voinescu, Andrew Moore, Gianni Antichi, and Marcin Wojcik. 2017. Re-architecting datacenter networks and stacks for low latency and high performance. In ACM SIGCOMM. Google ScholarDigital Library
- Chi-Yao Hong, Matthew Caesar, and P Godfrey. 2012. Finishing flows quickly with preemptive scheduling. In ACM SIGCOMM. Google ScholarDigital Library
- Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: distributed data-parallel programs from sequential building blocks. In ACM EuroSys. Google ScholarDigital Library
- Hamidreza Jahanjou, Erez Kantor, and Rajmohan Rajaraman. 2017. Asymptotically Optimal Approximation Algorithms for Coflow Scheduling. In ACM SPAA. Google ScholarDigital Library
- Samir Khuller, Jingling Li, Pascal Sturmfels, Kevin Sun, and Prayaag Venkat. 2018. Select and Permute: An Improved Online Framework for Scheduling to Minimize Weighted Completion Time. In LATIN.Google Scholar
- Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. 2012. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 5(8): 716--727. Google ScholarDigital Library
- Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In ACM SIGMOD. Google ScholarDigital Library
- Zhen Qiu, Cliff Stein, and Yuan Zhong. 2015. Minimizing the total weighted completion time of coflows in datacenter networks. In ACM SPAA. Google ScholarDigital Library
- Thomas A. Roemer. 2006. A note on the complexity of the concurrent open shop problem. In Journal of Scheduling, 9(4): 389--396. Springer. Google ScholarDigital Library
- Sushant Sachdeva and Rishi Saket. 2013. Optimal inapproximability for scheduling problems via structural hardness for hypergraph vertex cover. In IEEE CCC.Google Scholar
- Christo Wilson, Hitesh Ballani, Thomas Karagiannis, and Ant Rowtron. 2011. Better never than late: Meeting deadlines in datacenter networks. In ACM SIGCOMM. Google ScholarDigital Library
- Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey. 2008. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language.. In USENIX OSDI. Google ScholarDigital Library
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In USENIX NSDI. Google ScholarDigital Library
- Hong Zhang, Li Chen, Bairen Yi, Kai Chen, Mosharaf Chowdhury, and Yanhui Geng. 2016. CODA: Toward automatically identifying and scheduling coflows in the dark. In ACM SIGCOMM. Google ScholarDigital Library
- Yangming Zhao, Kai Chen, Wei Bai, Minlan Yu, Chen Tian, Yanhui Geng, Yiming Zhang, Dan Li, and Sheng Wang. 2015. Rapier: Integrating routing and scheduling for coflow-aware data center networks. In IEEE INFOCOM.Google Scholar
Index Terms
- Sincronia: near-optimal network design for coflows
Recommendations
Efficient coflow scheduling with Varys
SIGCOMM '14: Proceedings of the 2014 ACM conference on SIGCOMMCommunication in data-parallel applications often involves a collection of parallel flows. Traditional techniques to optimize flow-level metrics do not perform well in optimizing such collections, because the network is largely agnostic to application-...
PROSE: Multi-round fair coflow scheduling without prior knowledge
AbstractCoflow scheduling plays a crucial role in enhancing network-level performance in datacenter applications. Existing works on coflow scheduling predominantly concentrate on clairvoyant schedulers, which assume prior knowledge of coflow ...
Highlights- Fair non-clairvoyant coflow scheduler for non-cooperative environments.
- Two-...
Saath: Speeding up CoFlows by Exploiting the Spatial Dimension
CoNEXT '17: Proceedings of the 13th International Conference on emerging Networking EXperiments and TechnologiesCoFlow scheduling improves data-intensive application performance by improving their networking performance. State-of-the-art CoFlow schedulers in essence approximate the classic online Shortest-Job-First (SJF) scheduling, designed for a single CPU, in ...
Comments