ABSTRACT
Real-time transfer optimization approaches offer promising solutions as they can discover optimal transfer configuration in the runtime without requiring an upfront work or making assumptions about underlying system architectures. On the other hand, existing implementations suffer from slow convergence speed due to running many sample transfers with suboptimal configurations. In this work, we evaluate time-series models to minimize the impact of sample transfers with suboptimal configurations by shortening the transfer duration without degrading the accuracy. The results gathered in various networks with rich set of transfer configurations indicate that, in most cases, Autoregressive model can accurately estimate sample transfer throughput in less than 5 seconds which is up-to 4x improvement over the state-of-the-art solution. We also realized that while the most common transfer applications report transfer throughput at most once a second, decreasing the reporting interval is the key to further reduce the impact of sample transfers by quickly determining their performance.
- Ismail Alan, Engin Arslan, and Tevfik Kosar. 2015. Energy-aware data transfer algorithms. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 44. Google ScholarDigital Library
- William Allcock, John Bresnahan, Rajkumar Kettimuthu, Michael Link, Catalin Dumitrescu, Ioan Raicu, and Ian Foster. 2005. The Globus striped GridFTP framework and server. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing. IEEE Computer Society, 54. Google ScholarDigital Library
- B. Allen, J. Bresnahan, L. Childers, I. Foster, G. Kandaswamy, R. Kettimuthu, J. Kordas, M. Link, S. Martin, K. Pickett, and S. Tuecke. 2012. Software as a Service for Data Scientists. Commun. ACM 55:2 (2012), 81--88. Google ScholarDigital Library
- Engin Arslan, Kemal Guner, and Tevfik Kosar. 2016. HARP: Predictive Transfer Optimization Based on Historical Analysis and Real-time Probing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16). IEEE Press, Piscataway, NJ, USA, Article 25, 12 pages. http://dl.acm.org/citation.cfm?id=3014904.3014938 Google ScholarDigital Library
- Engin Arslan and Tevfik Kosar. 2018. High-Speed Transfer Optimization Based on Historical Analysis and Real-Time Tuning. IEEE Transactions on Parallel and Distributed Systems 29, 6 (2018), 1303--1316.Google ScholarCross Ref
- Engin Arslan, Bahadir A Pehlivan, and Tevfik Kosar. 2018. Big data transfer optimization through adaptive parameter tuning. J. Parallel and Distrib. Comput. 120 (2018), 89--100.Google ScholarDigital Library
- Engin Arslan, Brandon Ross, and Tevfik Kosar. 2013. Dynamic Protocol Tuning Algorithms for High Performance Data Transfers. In Proceedings of the 19th International Conference on Parallel Processing (Euro-Par'13). Springer-Verlag, Berlin, Heidelberg, 725--736. Google ScholarDigital Library
- Prasanna Balaprakash, Vitali Morozov, Rajkumar Kettimuthu, Kalyan Kumaran, and Ian Foster. 2016. Improving data transfer throughput with direct search optimization. In Parallel Processing (ICPP), 2016 45th International Conference on. IEEE, 248--257.Google ScholarCross Ref
- CMS. {n. d.}. The US Compact Muon Solenoid Project. http://uscms.fnal.gov/. ({n. d.}).Google Scholar
- N. Freed. {n. d.}. SMTP service extension for command pipelining. http://tools.ietf.org/html/rfc2920. ({n. d.}). Google ScholarDigital Library
- T. J. Hacker, B. D. Noble, and B. D. Atley. 2005. Adaptive Data Block Scheduling for Parallel Streams. In Proceedings of HPDC '05. ACM/IEEE, 265--275. Google ScholarDigital Library
- Andreas Hanemann, Jeff W Boote, Eric L Boyd, Jérôme Durand, Loukik Kudarimoti, Roman apacz, D Martin Swany, Szymon Trocha, and Jason Zurawski. 2005. Perfsonar: A service oriented architecture for multi-domain network monitoring. In International conference on service-oriented computing. Springer, 241--254. Google ScholarDigital Library
- T. Ito, H. Ohsaki, and M. Imase. 2008. On parameter tuning of data transfer protocol GridFTP for Wide-Area Networks. International Journal of Computer Science and Engineering 2(4) (Sept. 2008), 177--183.Google Scholar
- Rajkumar Kettimuthu, Gayane Vardoyan, Gagan Agrawal, and P Sadayappan. 2014. Modeling and optimizing large-scale wide-area data transfers. In Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on. IEEE, 196--205.Google ScholarDigital Library
- J.T. Kiehl, J. J. Hack, G. B. Bonan, B. A. Boville, D. L. Williamson, and P. J. Rasch. 1998. The National Center for Atmospheric Research Community Climate Model: CCM3. Journal of Climate 11:6 (1998), 1131--1149.Google ScholarCross Ref
- Youngjae Kim, Scott Atchley, Geoffroy R Vallée, and Galen M Shipman. 2015. {LADS}: Optimizing Data Transfers Using Layout-Aware Data Scheduling. In 13th USENIX Conference on File and Storage Technologies ({FAST} 15). 67--80.Google Scholar
- Richard J. T. Klein, Robert J. Nicholls, and Frank Thomalla. 2003. Resilience to natural hazards: How useful is this concept? Global Environmental Change Part B: Environmental Hazards 5, 1--2 (2003), 35 -- 45.Google ScholarCross Ref
- T. Kosar and M. Balman. 2009. A New Paradigm: Data-Aware Scheduling in Grid Computing. Future Generation Computing Systems 25, 4 (2009), 406--413. Google ScholarDigital Library
- Chenghao Liu, Imed Bouazizi, and Moncef Gabbouj. 2011. Rate adaptation for adaptive HTTP streaming. In Proceedings of the second annual ACM conference on Multimedia systems. ACM, 169--174. Google ScholarDigital Library
- Zhengchun Liu, Rajkumar Kettimuthu, Ian Foster, and Peter H Beckman. 2018. Toward a smart data transfer node. Future Generation Computer Systems (2018).Google Scholar
- MD S. Q. Zulkar Nine, Kemal Guner, and Tevfik Kosar. 2015. Hysteresis-based Optimization of Data Transfer Throughput. In Proceedings of the Fifth International Workshop on Network-Aware Data Management (NDM '15). ACM, New York, NY, USA, Article 5, 9 pages. Google ScholarDigital Library
- Suraj Pandey and Rajkumar Buyya. 2012. Scheduling workflow applications based on multi-source parallel data retrieval in distributed computing networks. Comput. J. 55, 11 (2012), 1288--1308.Google ScholarDigital Library
- Nageswara SV Rao, Qiang Liu, Satyabrata Sen, Greg Hinkel, Neena Imam, Ian Foster, Rajkumar Kettimuthu, Bradley W Settlemyer, Chase Q Wu, and Daqing Yun. 2016. Experimental analysis of file transfer rates over wide-area dedicated connections. In IEEE 18th High Performance Computing and Communications. IEEE, 198--205.Google Scholar
- E. Yildirim, E. Arslan, J. Kim, and T. Kosar. 2015. Application-Level Optimization of Big Data Transfers Through Pipelining, Parallelism and Concurrency. Cloud Computing, IEEE Transactions on PP, 99 (2015), 1--1. Google ScholarDigital Library
- Esma Yildirim, Jangyoung Kim, and Tevfik Kosar. 2013. Modeling throughput sampling size for a cloud-hosted data scheduling and optimization service. Future Generation Computer Systems 29, 7 (2013), 1795--1807. Google ScholarDigital Library
- Esma Yildirim and Tevfik Kosar. 2011. Network-aware end-to-end data throughput optimization. In Proceedings of the first international workshop on Networkaware data management (NDM '11). ACM, New York, NY, USA, 21--30. Google ScholarDigital Library
- E. Yildirim, D. Yin, and T. Kosar. 2011. Prediction of Optimal Parallelism Level in Wide Area Data Transfers. IEEE TPDS 22(12) (2011).Google Scholar
- Daqing Yun, Chase Q Wu, Nageswara SV Rao, Qiang Liu, Rajkumar Kettimuthu, and Eun-Sung Jung. 2017. Data Transfer Advisor with Transport Profiling Optimization. In Local Computer Networks (LCN), 2017 IEEE 42nd Conference on. IEEE, 269--277.Google ScholarCross Ref
Recommendations
Real-time genetic optimization of large file transfers
GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference CompanionTransfer configurations play a significant role in achieved throughput for file transfers in high-speed networks. However, finding an optimal setting for a given transfer task is an intractable, non-linear problem. Existing solutions thus rely on ...
Early detection of buzzwords based on large-scale time-series analysis of blog entries
HT '12: Proceedings of the 23rd ACM conference on Hypertext and social mediaIn this paper, we discuss a method for early detection of "gradual buzzwords" by analyzing time-series data of blog entries. We observe the process in which certain topics grow to become major buzzwords and determine the key indicators that are ...
Efficient algorithms for scheduling multiple bulk data transfers in inter-datacenter networks
Bulk data transfers, such as backups and propagation of bulky updates, account for a large portion of the inter-datacenter traffic. These bulk transfers consume massive bandwidth and further increase the operational cost of datacenters. The advent of ...
Comments