research-article

Open Access

There is more consensus in Egalitarian parliaments

Authors:
Iulian Moraru

Carnegie Mellon University and Intel Labs

Carnegie Mellon University and Intel Labs
View Profile

,
David G. Andersen

Carnegie Mellon University and Intel Labs

Carnegie Mellon University and Intel Labs
View Profile

,
Michael Kaminsky

Carnegie Mellon University and Intel Labs

Carnegie Mellon University and Intel Labs
View Profile

SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems PrinciplesNovember 2013Pages 358–372https://doi.org/10.1145/2517349.2517350

Published:03 November 2013Publication History

SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

Pages 358–372

ABSTRACT

This paper describes the design and implementation of Egalitarian Paxos (EPaxos), a new distributed consensus algorithm based on Paxos. EPaxos achieves three goals: (1) optimal commit latency in the wide-area when tolerating one and two failures, under realistic conditions; (2) uniform load balancing across all replicas (thus achieving high throughput); and (3) graceful performance degradation when replicas are slow or crash.

Egalitarian Paxos is to our knowledge the first protocol to achieve the previously stated goals efficiently---that is, requiring only a simple majority of replicas to be non-faulty, using a number of messages linear in the number of replicas to choose a command, and committing commands after just one communication round (one round trip) in the common case or after at most two rounds in any case. We prove Egalitarian Paxos's properties theoretically and demonstrate its advantages empirically through an implementation running on Amazon EC2.

Supplemental Material

d2-10-lulian-moraru.mp4

mp4

1,015.8 MB

Download

References

M. K. Aguilera, C. Delporte-Gallet, H. Fauconnier, and S. Toueg. Thrifty generic broadcast. In Proc. 14th International Conference on Distributed Computing, DISC '00, pages 268--282, London, UK, UK, 2000. Springer-Verlag. Google ScholarDigital Library
J. Baker, C. Bond, J. C. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M. Leon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In Proc. of the Conference on Innovative Data system Research (CIDR), pages 223--234, 2011.Google Scholar
M. Biely, Z. Milosevic, N. Santos, and A. Schiper. Spaxos: Offloading the leader for high throughput state machine replication. In Reliable Distributed Systems (SRDS), 2012 IEEE 31st Symposium on, 2012. Google ScholarDigital Library
M. Burrows. The Chubby lock service for loosely-coupled distributed systems. In Proc. 7th USENIX OSDI, Seattle, WA, Nov. 2006. Google ScholarDigital Library
L. J. Camargos, R. M. Schmidt, and F. Pedone. Multicoordinated paxos. In Proc. 26th annual ACM symposium on Principles of distributed computing, PODC '07, pages 316--317, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43: 225--267, Mar. 1996. Google ScholarDigital Library
T. D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: an engineering perspective. In Proc. 26th ACM SOSP, PODC '07, pages 398--407, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google's globally-distributed database. In Proc. 10th USENIX OSDI. USENIX, 2012. Google ScholarDigital Library
M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374--382, Apr. 1985. ISSN 0004-5411. Google ScholarDigital Library
Google AppEngine. High replication datastore, 2012. https://developers.google.com/appengine/docs/java/datastore/overview.Google Scholar
M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12(3), July 1990. Google ScholarDigital Library
P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: wait-free coordination for internet-scale systems. In Proc. USENIX ATC, USENIXATC'10, Berkeley, CA, USA, 2010. USENIX Association. Google ScholarDigital Library
M. Kaptritsos, Y. Wang, V. Quema, A. Clement, L. Alvisi, and M. Dahlin. Eve: Execute-verify replication for multi-core servers. In Proc. 10th USENIX OSDI, Hollywood, CA, Oct. 2012. Google ScholarDigital Library
T. Kraska, G. Pang, M. J. Franklin, S. Madden, and A. Fekete. MDCC: Multi-data center consistency. In Proc. 8th ACM European Conference on Computer Systems (EuroSys), Apr. 2013. Google ScholarDigital Library
L. Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16(2):133--169, 1998. ISSN 0734-2071. Google ScholarDigital Library
L. Lamport. Paxos made simple. ACM SIGACT News, 32 (4), Dec. 2001.Google Scholar
L. Lamport. Generalized consensus and Paxos. http://research.microsoft.com/apps/pubs/default.aspx?id=64631, 2005.Google Scholar
L. Lamport. Fast Paxos. http://research.microsoft.com/apps/pubs/default.aspx?id=64624, 2006.Google Scholar
L. Lamport, D. Malkhi, and L. Zhou. Vertical Paxos and primary-backup replication. Technical report, Microsoft Research, 2009.Google Scholar
L. Lamport, D. Malkhi, and L. Zhou. Reconfiguring a state machine. SIGACT News, 41(1), Mar. 2010. Google ScholarDigital Library
B. Liskov and J. Cowling. Viewstamped replication revisited. Technical Report MIT-CSAIL-TR-2012-021, MIT Computer Science and Artificial Intelligence Laboratory, 2012.Google Scholar
J. MacCormick, N. Murphy, M. Najork, C. A. Thekkath, and L. Zhou. Boxwood: abstractions as the foundation for storage infrastructure. In Proc. 6th USENIX OSDI, San Francisco, CA, Dec. 2004. Google ScholarDigital Library
Y. Mao, F. P. Junqueira, and K. Marzullo. Mencius: building efficient replicated state machines for WANs. In Proc. 8th USENIX OSDI, pages 369--384, San Diego, CA, Dec. 2008. Google ScholarDigital Library
I. Moraru, D. G. Andersen, and M. Kaminsky. Epaxos code base. https://github.com/efficient/epaxos, Aug. 2013.Google Scholar
I. Moraru, D. G. Andersen, and M. Kaminsky. A proof of correctness for Egalitarian Paxos. Technical report, Parallel Data Laboratory, Carnegie Mellon University, Aug. 2013. http://www.pdl.cmu.edu/PDL-FTP/associated/CMU-PDL-13-111.pdf.Google Scholar
F. Pedone and A. Schiper. Handling message semantics with generic broadcast protocols. Distributed Computing, 15:97--107, Apr. 2002. Google ScholarDigital Library
F. Pedone and A. Schiper. Optimistic atomic broadcast: a pragmatic viewpoint. Theoretical Computer Science, 291: 79--101, Jan. 2003. Google ScholarDigital Library
tpc-c. TPC benchmark C. http://www.tpc.org/tpcc/spec/tpcc_current.pdf, 2010.Google Scholar
P. Zieliński. Optimistic generic broadcast. In Proc. 19th International Symposium on Distributed Computing (DISC), pages 369--383, Kraków, Poland, Sept. 2005. Google ScholarDigital Library

Recommendations

Hierarchical Quorum Consensus: A New Algorithm for Managing Replicated Data

A novel algorithm for managing replicated data is presented. The proposed method is based on organizing the copies of an object into a logical, multilevel hierarchy, and extending the quorum consensus algorithm to such an environment. Several properties ...
Read More
Generalized Grid Quorum Consensus for Replica Control Protocol
CICN '11: Proceedings of the 2011 International Conference on Computational Intelligence and Communication Networks

In distributed systems it is often necessary to provide coordination among the multiple concurrent processes to tolerate the contention, periods of asynchrony and a number of failures. Quorum systems provide a decentralized approach for such ...
Read More
On the correctness of Egalitarian Paxos
Abstract
This paper identifies a problem in both the TLA + specification and the implementation of the Egalitarian Paxos protocol. It is related to how replicas switch from one ballot to another when computing the dependencies of a command. The ...
Highlights
- There is a problem in the specification and implementation of Egalitarian Paxos.
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
November 2013
498 pages
ISBN:9781450323888
DOI:10.1145/2517349
General Chair:
Michael Kaminsky
Intel Labs
,
Program Chair:
Mike Dahlin
Google and UT Austin
Copyright © 2013 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2013
Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate131of716submissions,18%
Upcoming Conference
SOSP '24

Sponsor:

sigops

ACM SIGOPS 29th Symposium on Operating Systems Principles

November 5 - 8, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 217
  Total Citations
  View Citations
- 12,994
  Total Downloads
- Downloads (Last 12 months)6,054
- Downloads (Last 6 weeks)1,253
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

There is more consensus in Egalitarian parliaments

SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ABSTRACT

Supplemental Material

References

Cited By

Recommendations

Hierarchical Quorum Consensus: A New Algorithm for Managing Replicated Data

Generalized Grid Quorum Consensus for Replica Control Protocol

On the correctness of Egalitarian Paxos

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

There is more consensus in Egalitarian parliaments

SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ABSTRACT

Supplemental Material

References

Cited By

Recommendations

Hierarchical Quorum Consensus: A New Algorithm for Managing Replicated Data

Generalized Grid Quorum Consensus for Replica Control Protocol

On the correctness of Egalitarian Paxos

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media