research-article

Open Access

Sparrow: distributed, low latency scheduling

Authors:
Kay Ousterhout

University of California, Berkeley

University of California, Berkeley
View Profile

,
Patrick Wendell

University of California, Berkeley

University of California, Berkeley
View Profile

,
Matei Zaharia

University of California, Berkeley

University of California, Berkeley
View Profile

,
Ion Stoica

University of California, Berkeley

University of California, Berkeley
View Profile

SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems PrinciplesNovember 2013Pages 69–84https://doi.org/10.1145/2517349.2522716

Published:03 November 2013Publication History

SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

Pages 69–84

ABSTRACT

Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for task schedulers, which will need to schedule millions of tasks per second on appropriate machines while offering millisecond-level latency and high availability. We demonstrate that a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design. We implement and deploy our scheduler, Sparrow, on a 110-machine cluster and demonstrate that Sparrow performs within 12% of an ideal scheduler.

Supplemental Material

d1-05-kay-ousterhout.mp4

mp4

1.2 GB

Download

References

Apache Thrift. http://thrift.apache.org.Google Scholar
G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Why Let Resources Idle? Aggressive Cloning of Jobs with Dolly. In HotCloud, 2012. Google ScholarDigital Library
G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the Outliers in Map-Reduce Clusters using Mantri. In Proc. OSDI, 2010. Google ScholarDigital Library
D. Bradley, T. S. Clair, M. Farrellee, Z. Guo, M. Livny, I. Sfiligoi, and T. Tannenbaum. An Update on the Scalability Limits of the Condor Batch System. Journal of Physics: Conference Series, 331(6), 2011.Google ScholarCross Ref
J. Dean and L. A. Barroso. The Tail at Scale. Communications of the ACM, 56(2), February 2013. Google ScholarDigital Library
A. Demers, S. Keshav, and S. Shenker. Analysis and Simulation of a Fair Queueing Algorithm. In Proc. SIGCOMM, 1989. Google ScholarDigital Library
D. L. Eager, E. D. Lazowska, and J. Zahorjan. Adaptive Load Sharing in Homogeneous Distributed Systems. IEEE Transactions on Software Engineering, 1986. Google ScholarDigital Library
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A Platform For Fine-Grained Resource Sharing in the Data Center. In Proc. NSDI, 2011. Google ScholarDigital Library
M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair Scheduling for Distributed Computing Clusters. In Proc. SOSP, 2009. Google ScholarDigital Library
M. A. Jette, A. B. Yoo, and M. Grondona. SLURM: Simple Linux Utility for Resource Management. In Proc. Job Scheduling Strategies for Parallel Processing, Lecture Notes in Computer Science, pages 44--60. Springer, 2003.Google Scholar
M. Kornacker and J. Erickson. Cloudera Impala: Real Time Queries in Apache Hadoop, For Real. http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/, October 2012.Google Scholar
S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: Interactive Analysis of Web-Scale Datasets. Proc. VLDB Endow., 2010. Google ScholarDigital Library
M. Mitzenmacher. How Useful is Old Information? volume 11, pages 6--20, 2000. Google ScholarDigital Library
M. Mitzenmacher. The Power of Two Choices in Randomized Load Balancing. IEEE Transactions on Parallel and Distributed Computing, 12(10):1094--1104, 2001. Google ScholarDigital Library
M. Mitzenmacher. The Power of Two Random Choices: A Survey of Techniques and Results. In S. Rajasekaran, P. Pardalos, J. Reif, and J. Rolim, editors, Handbook of Randomized Computing, volume 1, pages 255--312. Springer, 2001.Google ScholarCross Ref
A. C. Murthy. The Next Generation of Apache MapReduce. http://developer.yahoo.com/blogs/hadoop/next-generation-apache-hadoop-mapreduce-3061.html, February 2012.Google Scholar
K. Ousterhout, A. Panda, J. Rosen, S. Venkataraman, R. Xin, S. Ratnasamy, S. Shenker, and I. Stoica. The Case for Tiny Tasks in Compute Clusters. In Proc. HotOS, 2013. Google ScholarDigital Library
G. Park. A Generalization of Multiple Choice Balls-into-Bins. In Proc. PODC, pages 297--298, 2011. Google ScholarDigital Library
L. Rudolph, M. Slivkin-Allalouf, and E. Upfal. A Simple Load Balancing Scheme for Task Allocation in Parallel Machines. In Proc. SPAA, 1991. Google ScholarDigital Library
M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes. Omega: flexible, scalable schedulers for large compute clusters. In Proc. EuroSys, 2013. Google ScholarDigital Library
B. Sharma, V. Chudnovsky, J. L. Hellerstein, R. Rifaat, and C. R. Das. Modeling and Synthesizing Task Placement Constraints in Google Compute Clusters. In Proc. SOCC, 2011. Google ScholarDigital Library
D. Shue, M. J. Freedman, and A. Shaikh. Performance Isolation and Fairness for Multi-Tenant Cloud Storage. In Proc. OSDI, 2012. Google ScholarDigital Library
T. White. Hadoop: The Definitive Guide. O'Reilly Media, 2009. Google ScholarDigital Library
R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: SQL and Rich Analytics at Scale. In Proc. SIGMOD, 2013. Google ScholarDigital Library
M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay Scheduling: A Simple Technique For Achieving Locality and Fairness in Cluster Scheduling. In Proc. EuroSys, 2010. Google ScholarDigital Library
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proc. NSDI, 2012. Google ScholarDigital Library
M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving MapReduce Performance in Heterogeneous Environments. In Proc. OSDI, 2008. Google ScholarDigital Library

Recommendations

Scheduling of deteriorating jobs with release dates to minimize the maximum lateness

In this paper, we consider the problem of scheduling n deteriorating jobs with release dates on a single (batching) machine. Each job's processing time is a simple linear function of its starting time. The objective is to minimize the maximum lateness. ...
Read More
Primary-secondary bicriteria scheduling on identical machines to minimize the total completion time of all jobs and the maximum T-time of all machines

In this paper, we study a new primary-secondary bicriteria scheduling problem on identical machines. The primary objective is to minimize the total completion time of all jobs and the secondary objective is to minimize the maximum T-time of all machines,...
Read More
Sparrow Search Algorithm for Solving Flexible Jobshop Scheduling Problem
Advances in Swarm Intelligence
Abstract
With the global development of the third industrial revolution, intelligent manufacturing has received attention from many countries and regions since it was first proposed. In the next ten years, intelligent manufacturing has become an important ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
November 2013
498 pages
ISBN:9781450323888
DOI:10.1145/2517349
General Chair:
Michael Kaminsky
Intel Labs
,
Program Chair:
Mike Dahlin
Google and UT Austin
Copyright © 2013 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2013
Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate131of716submissions,18%
Upcoming Conference
SOSP '24

Sponsor:

sigops

ACM SIGOPS 29th Symposium on Operating Systems Principles

November 5 - 8, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 424
  Total Citations
  View Citations
- 12,233
  Total Downloads
- Downloads (Last 12 months)717
- Downloads (Last 6 weeks)87
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sparrow: distributed, low latency scheduling

SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ABSTRACT

Supplemental Material

References

Cited By

Recommendations

Scheduling of deteriorating jobs with release dates to minimize the maximum lateness

Primary-secondary bicriteria scheduling on identical machines to minimize the total completion time of all jobs and the maximum T-time of all machines

Sparrow Search Algorithm for Solving Flexible Jobshop Scheduling Problem

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Sparrow: distributed, low latency scheduling

SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ABSTRACT

Supplemental Material

References

Cited By

Recommendations

Scheduling of deteriorating jobs with release dates to minimize the maximum lateness

Primary-secondary bicriteria scheduling on identical machines to minimize the total completion time of all jobs and the maximum T-time of all machines

Sparrow Search Algorithm for Solving Flexible Jobshop Scheduling Problem

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media