skip to main content
survey

Classification Framework of MapReduce Scheduling Algorithms

Authors Info & Claims
Published:16 April 2015Publication History
Skip Abstract Section

Abstract

A MapReduce scheduling algorithm plays a critical role in managing large clusters of hardware nodes and meeting multiple quality requirements by controlling the order and distribution of users, jobs, and tasks execution. A comprehensive and structured survey of the scheduling algorithms proposed so far is presented here using a novel multidimensional classification framework. These dimensions are (i) meeting quality requirements, (ii) scheduling entities, and (iii) adapting to dynamic environments; each dimension has its own taxonomy. An empirical evaluation framework for these algorithms is recommended. This survey identifies various open issues and directions for future research.

Skip Supplemental Material Section

Supplemental Material

References

  1. AMAZON. 2012. Amazon EC2. (Sep 2012). Retrieved October 19, 2012, from http://aws.amazon.com/ec2/.Google ScholarGoogle Scholar
  2. APHIVE. 2013. Apache HIVE. Retrieved June 19, 2013, from http://hive.apache.org/.Google ScholarGoogle Scholar
  3. APPIG. 2013. Apache Pig. Retrieved June 19, 2013, from http://pig.apache.org/.Google ScholarGoogle Scholar
  4. Peter Brucker. 2004. Scheduling Algorithms. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. X. Bu, J. Rao, and C. Z. Xu. 2013. Interference and locality-aware task scheduling for MapReduce applications in virtual clusters. In Proceedings of the HPDC. 227--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. Chen, M. Kodialam, and T. V. Lakshman. 2012. Joint scheduling of processing and shuffle phases in MapReduce systems. In Proceedings of INFOCOM. 1143--1151.Google ScholarGoogle Scholar
  7. Q. Chen, M. Guo, Q. Deng, L. Zheng, S. Guo, and Y. Shen. 2013. HAT: History-based auto-tuning MapReduce in heterogeneous environments. The Journal of Supercomputing 64, 3 (2013), 1038--1054. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Q. Chen, D. Zhang, M. Guo, Q. Deng, and S. Guo. 2010. SAMR: A self-adaptive MapReduce scheduling algorithm in heterogeneous environment. In Proceedings of CIT. 2736--2743. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Dean and S. Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51 (2008), 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Dhok, N. Maheshwari, and V. Varma. 2010. Learning based opportunistic admission control algorithm for MapReduce as a service. In Proceedings of ISEC. 153--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. J. Fischer, X. Su, and Y. Yin. 2010. Assigning tasks for efficiency in Hadoop: Extended abstract. In Proceedings of SPAA. 30--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Z. Guo, G. Fox, and M. Zhou. 2012. Improving resource utilization in MapReduce. Technical Report of Indiana University (2012).Google ScholarGoogle Scholar
  13. HADOOP. 2012. The Apache Hadoop Project. (September 2012). Retrieved October 2, 2012, from http://hadoop.apache.org/docs/r1.2.1/.Google ScholarGoogle Scholar
  14. M. Hammoud, M. S. Rehman, and M. F. Sakr. 2012. Center-of-gravity reduce task scheduling to lower MapReduce network traffic. In IEEE CLOUD. 49--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. J. Hanson. 2011. An introduction to the Hadoop distributed file system. IBM Developer Works, Technical Library (2011).Google ScholarGoogle Scholar
  16. HDPAPPS. 2012a. Apache Hadoop YARN. Retrieved April 2014, from http://hadoop.apache.org/docs/current/.Google ScholarGoogle Scholar
  17. HDPAPPS. 2012b. Applications powered by Hadoop. Retrieved November 19, 2012, from http://wiki.apache.org/hadoop/PoweredBy.Google ScholarGoogle Scholar
  18. C. He, Y. Lu, and D. Swanson. 2011. Matchmaking: A new MapReduce scheduling technique. In Proceedings of CloudCom. 40--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. 2011. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of NSDI. 295--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Ibrahim, H. Jin, L. Lu, B. He, G. Antoniu, and S. Wu. 2012. Maestro: Replica-aware map scheduling for MapReduce. IEEE International Symposium on Cluster Computing and the Grid 0 (2012), 435--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. 2009. Quincy: Fair scheduling for distributed computing clusters. In Proceedings of SOSP. 261--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Jain. 1991. The Art of Computer Systems Performance Analysis - Techniques for Experimental Design, Measurement, Simulation, and Modeling. Wiley. I--XXVII, 1--685.Google ScholarGoogle Scholar
  23. J. Jin, J. Luo, A. Song, F. Dong, and R. Xiong. 2011. BAR: An efficient data locality driven task scheduling algorithm for cloud computing. In Proceedings of CCGRID. 295--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Kc and K. Anyanwu. 2010. Scheduling Hadoop jobs to meet deadlines. In Proceedings of CLOUDCOM. 388--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. A. Kumar, V. K. Konishetty, K. Voruganti, and G. V. P. Rao. 2012. CASH: Context aware scheduler for Hadoop. In Proceedings of ICACCI. 52--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. W. Lang and J. M. Patel. 2010. Energy management for MapReduce clusters. Proceedings of VLDB Endowment 3, 1--2 (2010), 129--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. L. Lawler, J. K. Lenstra, A. H. G. Rinnooy Kan, and D. B. Shmoys. 1993. Sequencing and scheduling: Algorithms and complexity. Handbooks in Operations Research and Management Science 4 (1993), 445--522.Google ScholarGoogle ScholarCross RefCross Ref
  28. J. Leverich and C. Kozyrakis. 2010. On the energy (in)efficiency of Hadoop clusters. SIGOPS Operating Systems Review 44, 1 (2010), 61--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. Lin, X. Ma, J. Archuleta, W. Feng, M. Gardner, and Z. Zhang. 2010. MOON: MapReduce on opportunistic environments. In Proceedings of HPDC. 95--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Mao, S. Hu, Z. Zhang, L. Xiao, and L. Ruan. 2011. A load-driven task scheduler with adaptive DSC for MapReduce. In Proceedings of GREENCOM. 28--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Mattess, R. N. Calheiros, and R. Buyya. 2013. Scaling MapReduce applications across hybrid clouds to meet soft deadlines. In Proceedings of AINA. 629--636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Nanduri, N. Maheshwari, A. Reddyraja, and V. Varma. 2011. Job aware scheduling algorithm for MapReduce framework. In Proceedings of CloudCom. 724--729. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. P. Nguyen, T. Simon, M. Halem, D. Chapman, and Q. Le. 2012. A hybrid scheduling algorithm for data intensive workloads in a MapReduce environment. In Proceedings of UCC. 161--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica. 2013. Sparrow: Distributed, low latency scheduling. In Proceedings of SOSP. 69--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. Visalakshi and T. U. Karthik. 2011. MapReduce scheduler using classifiers for heterogeneous workloads. International Journal of Computer Science and Network Security 11 (2011), 68--73.Google ScholarGoogle Scholar
  36. J. Park, D. Lee, B. Kim, J. Huh, and S. Maeng. 2012. Locality-aware dynamic VM reconfiguration on MapReduce clouds. In Proceedings of HPDC. 27--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Z. Peng and Y. Ma. 2011. A new scheduling algorithm in Hadoop MapReduce. Communications in Computer and Information Science 237 (2011), 537--543.Google ScholarGoogle ScholarCross RefCross Ref
  38. L. T. X. Phan, Z. Zhang, Q. Zheng, B. T. Loo, and I. Lee. 2011. An empirical analysis of scheduling techniques for real-time cloud-based data processing. In Proceedings of SOCA. 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Polo, D. de Nadal, D. Carrera, Y. Becerra, V. Beltran, J. Torres, and E. Ayguade. 2009. Adaptive task scheduling for multijob MapReduce environments. In Proceedings of Jornadas de Paralelismo Conference. 96--101A.Google ScholarGoogle Scholar
  40. X. Qiu, W. L. Yeow, C. Wu, and F. C. M. Lau. 2013. Cost-minimizing preemptive scheduling of MapReduce workloads on hybrid clouds. In Proceedings of IWQoS. 1--6.Google ScholarGoogle Scholar
  41. B. T. Rao and L. S. S. Reddy. 2011. Survey on improved scheduling in Hadoop MapReduce in cloud environments. International Journal of Computer Applications 34 (2011), 29--33.Google ScholarGoogle Scholar
  42. A. Rasooli and D. G. Down. 2011. An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems. In Proceedings of CASCON. 30--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Rasooli and D. G. Down. 2012. A hybrid scheduling approach for scalable heterogeneous Hadoop systems. In Proceedings of SCC. 1284--1291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. T. Sandholm and K. Lai. 2010. Dynamic proportional share scheduling in Hadoop. In Proceedings of JSSPP. 110--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes. 2013. Omega: Flexible, scalable schedulers for large compute clusters. In Proceedings of EuroSys. 351--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. B. Sharma, T. Wood, and C. R. Das. 2013. HybridMR: A hierarchical MapReduce scheduler for hybrid data centers. In Proceedings of ICDCS. 102--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. B. Shi and A. Srivastava. 2010. Thermal and power-aware task scheduling for Hadoop based storage centric datacenters. In Proceedings of GreenComp. 73--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. X. Sun, C. He, and Y. Lu. 2012. ESAMR: An enhanced self-adaptive MapReduce scheduling algorithm. In Proceedings of ICPADS. 148--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. Tan, X. Meng, and L. Zhang. 2012. Coupling scheduler for Mapreduce/Hadoop. In Proceedings of HPDC. 129--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Z. Tang, J. Zhou, K. Li, and R. Li. 2012. A MapReduce task scheduling algorithm for deadline constraints. Cluster Computing, Springer (Dec 2012), 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. 2009. Hive- A warehousing solution over a map-reduce framework. In Proceedings of VLDB Endowment. 1626--1629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. C. Tian, H. Zhou, Y. He, and L. Zha. 2009. A dynamic MapReduce scheduler for heterogeneous workloads. In Proceedings of GCC. 218--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, and E. Baldeschwieler. 2013. Apache Hadoop yarn: Yet another resource negotiator. In Proceedings of SOCC. 5:1--5:16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. A. Verma, L. Cherkasova, and R. H. Campbell. 2012a. Two sides of a coin: Optimizing the schedule of MapReduce jobs to minimize their makespan and improve cluster performance. In Proceedings of MASCOTS. 11--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. A. Verma, L. Cherkasova, V. S. Kumar, and R. H. Campbell. 2012b. Deadline-based workload management for MapReduce environments: Pieces of the performance puzzle. In Proceedings of NOMS. 900--905.Google ScholarGoogle Scholar
  56. X. Wang and Y. Wang. 2011. Energy-efficient multi-task scheduling based on MapReduce for cloud computing. In Proceedings of CIS. 57--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Y. Wang and W. Shi. 2013. On scheduling algorithms for MapReduce jobs in heterogeneous clouds with budget constraints. In Proceedings of OPODIS. 251--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. T. White. 2009. Hadoop: The Definitive Guide (1st ed.). O’Reilly Media, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. J. Wolf, A. Balmin, D. Rajan, K. Hildrum, R. Khandekar, S. Parekh, K. Wu, and R. Vernica. 2012. CIRCUMFLEX: A scheduling optimizer for MapReduce workloads with shared scans. SIGOPS Operating Systems Review. 46 (2012), 26--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. J. Wolf, D. Rajan, K. Hildrum, R. Khandekar, V. Kumar, S. Parekh, K. Wu, and A. balmin. 2010. FLEX: A slot allocation scheduling optimizer for MapReduce workloads. In Proceedings of Middleware. 1--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Y. Xia, L. Wang, Q. Zhao, and G. Zhang. 2011. Research on job scheduling algorithm in Hadoop. Journal of Computational Information Systems 7 (2011), 5769--5775.Google ScholarGoogle Scholar
  62. N. Yigitbasi, K. Datta, N. Jain, and T. Willke. 2011. Energy efficient scheduling of MapReduce workloads on heterogeneous clusters. In Proceedings of GCM. 1:1--1:6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. D. Yoo and K. M. Sim. 2011. A comparative review of job scheduling for MapReduce. In Proceedings of CCIS. 353--358.Google ScholarGoogle Scholar
  64. D. Yoo and K. M. Sim. 2012. A locality enhanced scheduling method for multiple MapReduce jobs in a workflow application. IPCSIT 24 (Feb 2012), 142--146.Google ScholarGoogle Scholar
  65. M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. 2009. Job Scheduling for Multi-User MapReduce Clusters. Technical Report. EECS Department, University of California, Berkeley.Google ScholarGoogle Scholar
  66. M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. 2010. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of EuroSys. 265--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. 2008. Improving MapReduce performance in heterogeneous environments. In Proceedings of OSDI. 29--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. X. Zhang, Z. Zhong, S. Feng, B. Tu, and J. Fan. 2011. Improving data locality of MapReduce by scheduling in homogeneous computing environments. In Proceedings of ISPA. 120--126. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Classification Framework of MapReduce Scheduling Algorithms

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Computing Surveys
          ACM Computing Surveys  Volume 47, Issue 3
          April 2015
          602 pages
          ISSN:0360-0300
          EISSN:1557-7341
          DOI:10.1145/2737799
          • Editor:
          • Sartaj Sahni
          Issue’s Table of Contents

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 16 April 2015
          • Accepted: 1 December 2014
          • Revised: 1 October 2014
          • Received: 1 January 2014
          Published in csur Volume 47, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • survey
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader