survey

Classification Framework of MapReduce Scheduling Algorithms

Authors:
Nidhi Tiwari

Infosys Ltd., India; IITB-Monash Research Academy, Bangalore, India

Infosys Ltd., India; IITB-Monash Research Academy, Bangalore, India
View Profile

,
Santonu Sarkar

BITS Pilani K.K. Birla Goa Campus, Goa, Zuarinagar, Sancoale, India

BITS Pilani K.K. Birla Goa Campus, Goa, Zuarinagar, Sancoale, India
View Profile

,
Umesh Bellur

Indian Institute of Technology Bombay, Powai, India

Indian Institute of Technology Bombay, Powai, India
View Profile

,
Maria Indrawan

Monash University, Victoria, Australia

Monash University, Victoria, Australia
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 47 Issue 3Article No.: 49pp 1–38https://doi.org/10.1145/2693315

Published:16 April 2015Publication History

ACM Computing Surveys

Abstract

A MapReduce scheduling algorithm plays a critical role in managing large clusters of hardware nodes and meeting multiple quality requirements by controlling the order and distribution of users, jobs, and tasks execution. A comprehensive and structured survey of the scheduling algorithms proposed so far is presented here using a novel multidimensional classification framework. These dimensions are (i) meeting quality requirements, (ii) scheduling entities, and (iii) adapting to dynamic environments; each dimension has its own taxonomy. An empirical evaluation framework for these algorithms is recommended. This survey identifies various open issues and directions for future research.

Supplemental Material

Available for Download

zip

tiwari.zip (52.7 KB)

Supplemental movie, appendix, image and software files for, Classification Framework of MapReduce Scheduling Algorithms

References

AMAZON. 2012. Amazon EC2. (Sep 2012). Retrieved October 19, 2012, from http://aws.amazon.com/ec2/.Google Scholar
APHIVE. 2013. Apache HIVE. Retrieved June 19, 2013, from http://hive.apache.org/.Google Scholar
APPIG. 2013. Apache Pig. Retrieved June 19, 2013, from http://pig.apache.org/.Google Scholar
Peter Brucker. 2004. Scheduling Algorithms. Springer-Verlag. Google ScholarDigital Library
X. Bu, J. Rao, and C. Z. Xu. 2013. Interference and locality-aware task scheduling for MapReduce applications in virtual clusters. In Proceedings of the HPDC. 227--238. Google ScholarDigital Library
F. Chen, M. Kodialam, and T. V. Lakshman. 2012. Joint scheduling of processing and shuffle phases in MapReduce systems. In Proceedings of INFOCOM. 1143--1151.Google Scholar
Q. Chen, M. Guo, Q. Deng, L. Zheng, S. Guo, and Y. Shen. 2013. HAT: History-based auto-tuning MapReduce in heterogeneous environments. The Journal of Supercomputing 64, 3 (2013), 1038--1054. Google ScholarDigital Library
Q. Chen, D. Zhang, M. Guo, Q. Deng, and S. Guo. 2010. SAMR: A self-adaptive MapReduce scheduling algorithm in heterogeneous environment. In Proceedings of CIT. 2736--2743. Google ScholarDigital Library
J. Dean and S. Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51 (2008), 107--113. Google ScholarDigital Library
J. Dhok, N. Maheshwari, and V. Varma. 2010. Learning based opportunistic admission control algorithm for MapReduce as a service. In Proceedings of ISEC. 153--160. Google ScholarDigital Library
M. J. Fischer, X. Su, and Y. Yin. 2010. Assigning tasks for efficiency in Hadoop: Extended abstract. In Proceedings of SPAA. 30--39. Google ScholarDigital Library
Z. Guo, G. Fox, and M. Zhou. 2012. Improving resource utilization in MapReduce. Technical Report of Indiana University (2012).Google Scholar
HADOOP. 2012. The Apache Hadoop Project. (September 2012). Retrieved October 2, 2012, from http://hadoop.apache.org/docs/r1.2.1/.Google Scholar
M. Hammoud, M. S. Rehman, and M. F. Sakr. 2012. Center-of-gravity reduce task scheduling to lower MapReduce network traffic. In IEEE CLOUD. 49--58. Google ScholarDigital Library
J. J. Hanson. 2011. An introduction to the Hadoop distributed file system. IBM Developer Works, Technical Library (2011).Google Scholar
HDPAPPS. 2012a. Apache Hadoop YARN. Retrieved April 2014, from http://hadoop.apache.org/docs/current/.Google Scholar
HDPAPPS. 2012b. Applications powered by Hadoop. Retrieved November 19, 2012, from http://wiki.apache.org/hadoop/PoweredBy.Google Scholar
C. He, Y. Lu, and D. Swanson. 2011. Matchmaking: A new MapReduce scheduling technique. In Proceedings of CloudCom. 40--47. Google ScholarDigital Library
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. 2011. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of NSDI. 295--308. Google ScholarDigital Library
S. Ibrahim, H. Jin, L. Lu, B. He, G. Antoniu, and S. Wu. 2012. Maestro: Replica-aware map scheduling for MapReduce. IEEE International Symposium on Cluster Computing and the Grid 0 (2012), 435--442. Google ScholarDigital Library
M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. 2009. Quincy: Fair scheduling for distributed computing clusters. In Proceedings of SOSP. 261--276. Google ScholarDigital Library
R. Jain. 1991. The Art of Computer Systems Performance Analysis - Techniques for Experimental Design, Measurement, Simulation, and Modeling. Wiley. I--XXVII, 1--685.Google Scholar
J. Jin, J. Luo, A. Song, F. Dong, and R. Xiong. 2011. BAR: An efficient data locality driven task scheduling algorithm for cloud computing. In Proceedings of CCGRID. 295--304. Google ScholarDigital Library
K. Kc and K. Anyanwu. 2010. Scheduling Hadoop jobs to meet deadlines. In Proceedings of CLOUDCOM. 388--392. Google ScholarDigital Library
K. A. Kumar, V. K. Konishetty, K. Voruganti, and G. V. P. Rao. 2012. CASH: Context aware scheduler for Hadoop. In Proceedings of ICACCI. 52--61. Google ScholarDigital Library
W. Lang and J. M. Patel. 2010. Energy management for MapReduce clusters. Proceedings of VLDB Endowment 3, 1--2 (2010), 129--139. Google ScholarDigital Library
E. L. Lawler, J. K. Lenstra, A. H. G. Rinnooy Kan, and D. B. Shmoys. 1993. Sequencing and scheduling: Algorithms and complexity. Handbooks in Operations Research and Management Science 4 (1993), 445--522.Google ScholarCross Ref
J. Leverich and C. Kozyrakis. 2010. On the energy (in)efficiency of Hadoop clusters. SIGOPS Operating Systems Review 44, 1 (2010), 61--65. Google ScholarDigital Library
H. Lin, X. Ma, J. Archuleta, W. Feng, M. Gardner, and Z. Zhang. 2010. MOON: MapReduce on opportunistic environments. In Proceedings of HPDC. 95--106. Google ScholarDigital Library
H. Mao, S. Hu, Z. Zhang, L. Xiao, and L. Ruan. 2011. A load-driven task scheduler with adaptive DSC for MapReduce. In Proceedings of GREENCOM. 28--33. Google ScholarDigital Library
M. Mattess, R. N. Calheiros, and R. Buyya. 2013. Scaling MapReduce applications across hybrid clouds to meet soft deadlines. In Proceedings of AINA. 629--636. Google ScholarDigital Library
R. Nanduri, N. Maheshwari, A. Reddyraja, and V. Varma. 2011. Job aware scheduling algorithm for MapReduce framework. In Proceedings of CloudCom. 724--729. Google ScholarDigital Library
P. Nguyen, T. Simon, M. Halem, D. Chapman, and Q. Le. 2012. A hybrid scheduling algorithm for data intensive workloads in a MapReduce environment. In Proceedings of UCC. 161--167. Google ScholarDigital Library
K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica. 2013. Sparrow: Distributed, low latency scheduling. In Proceedings of SOSP. 69--84. Google ScholarDigital Library
P. Visalakshi and T. U. Karthik. 2011. MapReduce scheduler using classifiers for heterogeneous workloads. International Journal of Computer Science and Network Security 11 (2011), 68--73.Google Scholar
J. Park, D. Lee, B. Kim, J. Huh, and S. Maeng. 2012. Locality-aware dynamic VM reconfiguration on MapReduce clouds. In Proceedings of HPDC. 27--36. Google ScholarDigital Library
Z. Peng and Y. Ma. 2011. A new scheduling algorithm in Hadoop MapReduce. Communications in Computer and Information Science 237 (2011), 537--543.Google ScholarCross Ref
L. T. X. Phan, Z. Zhang, Q. Zheng, B. T. Loo, and I. Lee. 2011. An empirical analysis of scheduling techniques for real-time cloud-based data processing. In Proceedings of SOCA. 1--8. Google ScholarDigital Library
J. Polo, D. de Nadal, D. Carrera, Y. Becerra, V. Beltran, J. Torres, and E. Ayguade. 2009. Adaptive task scheduling for multijob MapReduce environments. In Proceedings of Jornadas de Paralelismo Conference. 96--101A.Google Scholar
X. Qiu, W. L. Yeow, C. Wu, and F. C. M. Lau. 2013. Cost-minimizing preemptive scheduling of MapReduce workloads on hybrid clouds. In Proceedings of IWQoS. 1--6.Google Scholar
B. T. Rao and L. S. S. Reddy. 2011. Survey on improved scheduling in Hadoop MapReduce in cloud environments. International Journal of Computer Applications 34 (2011), 29--33.Google Scholar
A. Rasooli and D. G. Down. 2011. An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems. In Proceedings of CASCON. 30--44. Google ScholarDigital Library
A. Rasooli and D. G. Down. 2012. A hybrid scheduling approach for scalable heterogeneous Hadoop systems. In Proceedings of SCC. 1284--1291. Google ScholarDigital Library
T. Sandholm and K. Lai. 2010. Dynamic proportional share scheduling in Hadoop. In Proceedings of JSSPP. 110--131. Google ScholarDigital Library
M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes. 2013. Omega: Flexible, scalable schedulers for large compute clusters. In Proceedings of EuroSys. 351--364. Google ScholarDigital Library
B. Sharma, T. Wood, and C. R. Das. 2013. HybridMR: A hierarchical MapReduce scheduler for hybrid data centers. In Proceedings of ICDCS. 102--111. Google ScholarDigital Library
B. Shi and A. Srivastava. 2010. Thermal and power-aware task scheduling for Hadoop based storage centric datacenters. In Proceedings of GreenComp. 73--83. Google ScholarDigital Library
X. Sun, C. He, and Y. Lu. 2012. ESAMR: An enhanced self-adaptive MapReduce scheduling algorithm. In Proceedings of ICPADS. 148--155. Google ScholarDigital Library
J. Tan, X. Meng, and L. Zhang. 2012. Coupling scheduler for Mapreduce/Hadoop. In Proceedings of HPDC. 129--130. Google ScholarDigital Library
Z. Tang, J. Zhou, K. Li, and R. Li. 2012. A MapReduce task scheduling algorithm for deadline constraints. Cluster Computing, Springer (Dec 2012), 1--8. Google ScholarDigital Library
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. 2009. Hive- A warehousing solution over a map-reduce framework. In Proceedings of VLDB Endowment. 1626--1629. Google ScholarDigital Library
C. Tian, H. Zhou, Y. He, and L. Zha. 2009. A dynamic MapReduce scheduler for heterogeneous workloads. In Proceedings of GCC. 218--224. Google ScholarDigital Library
V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, and E. Baldeschwieler. 2013. Apache Hadoop yarn: Yet another resource negotiator. In Proceedings of SOCC. 5:1--5:16. Google ScholarDigital Library
A. Verma, L. Cherkasova, and R. H. Campbell. 2012a. Two sides of a coin: Optimizing the schedule of MapReduce jobs to minimize their makespan and improve cluster performance. In Proceedings of MASCOTS. 11--18. Google ScholarDigital Library
A. Verma, L. Cherkasova, V. S. Kumar, and R. H. Campbell. 2012b. Deadline-based workload management for MapReduce environments: Pieces of the performance puzzle. In Proceedings of NOMS. 900--905.Google Scholar
X. Wang and Y. Wang. 2011. Energy-efficient multi-task scheduling based on MapReduce for cloud computing. In Proceedings of CIS. 57--62. Google ScholarDigital Library
Y. Wang and W. Shi. 2013. On scheduling algorithms for MapReduce jobs in heterogeneous clouds with budget constraints. In Proceedings of OPODIS. 251--265. Google ScholarDigital Library
T. White. 2009. Hadoop: The Definitive Guide (1st ed.). O’Reilly Media, Inc. Google ScholarDigital Library
J. Wolf, A. Balmin, D. Rajan, K. Hildrum, R. Khandekar, S. Parekh, K. Wu, and R. Vernica. 2012. CIRCUMFLEX: A scheduling optimizer for MapReduce workloads with shared scans. SIGOPS Operating Systems Review. 46 (2012), 26--32. Google ScholarDigital Library
J. Wolf, D. Rajan, K. Hildrum, R. Khandekar, V. Kumar, S. Parekh, K. Wu, and A. balmin. 2010. FLEX: A slot allocation scheduling optimizer for MapReduce workloads. In Proceedings of Middleware. 1--20. Google ScholarDigital Library
Y. Xia, L. Wang, Q. Zhao, and G. Zhang. 2011. Research on job scheduling algorithm in Hadoop. Journal of Computational Information Systems 7 (2011), 5769--5775.Google Scholar
N. Yigitbasi, K. Datta, N. Jain, and T. Willke. 2011. Energy efficient scheduling of MapReduce workloads on heterogeneous clusters. In Proceedings of GCM. 1:1--1:6. Google ScholarDigital Library
D. Yoo and K. M. Sim. 2011. A comparative review of job scheduling for MapReduce. In Proceedings of CCIS. 353--358.Google Scholar
D. Yoo and K. M. Sim. 2012. A locality enhanced scheduling method for multiple MapReduce jobs in a workflow application. IPCSIT 24 (Feb 2012), 142--146.Google Scholar
M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. 2009. Job Scheduling for Multi-User MapReduce Clusters. Technical Report. EECS Department, University of California, Berkeley.Google Scholar
M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. 2010. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of EuroSys. 265--278. Google ScholarDigital Library
M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. 2008. Improving MapReduce performance in heterogeneous environments. In Proceedings of OSDI. 29--42. Google ScholarDigital Library
X. Zhang, Z. Zhong, S. Feng, B. Tu, and J. Fan. 2011. Improving data locality of MapReduce by scheduling in homogeneous computing environments. In Proceedings of ISPA. 120--126. Google ScholarDigital Library

Index Terms

Classification Framework of MapReduce Scheduling Algorithms

Recommendations

An optimized MapReduce workflow scheduling algorithm for heterogeneous computing

The MapReduce framework is considered to be an effective resolution for huge and parallel data processing. This paper treats a massive data processing workflow as a DAG graph consisting of MapReduce jobs. In a heterogeneous computing environment, the ...
Read More
TaskTracker aware scheduler with resource availability control for Hadoop MapReduce

Schedulers are playing a vital role in task assignment for Hadoop MapReduce. In some scenario, the default schedulers of Hadoop spawn tasks in TaskTracker without checking the external dependency and may fail. As a result, Hadoop should rerun the tasks in ...
Read More
MapReduce scheduling algorithms in Hadoop: a systematic study
Abstract
Hadoop is a framework for storing and processing huge volumes of data on clusters. It uses Hadoop Distributed File System (HDFS) for storing data and uses MapReduce to process that data. MapReduce is a parallel computing framework for processing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 47, Issue 3
April 2015
602 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/2737799
Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering / University of Florida / Gainesville, FL
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 April 2015
- Accepted: 1 December 2014
- Revised: 1 October 2014
- Received: 1 January 2014
Published in csur Volume 47, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Distributed computing
Hadoop
MapReduce
big-data
distributed data
scheduling
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 48
  Total Citations
  View Citations
- 1,648
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Classification Framework of MapReduce Scheduling Algorithms

ACM Computing Surveys

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

An optimized MapReduce workflow scheduling algorithm for heterogeneous computing

TaskTracker aware scheduler with resource availability control for Hadoop MapReduce

MapReduce scheduling algorithms in Hadoop: a systematic study