skip to main content
10.1145/2517349.2517350acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article
Open Access

There is more consensus in Egalitarian parliaments

Published:03 November 2013Publication History

ABSTRACT

This paper describes the design and implementation of Egalitarian Paxos (EPaxos), a new distributed consensus algorithm based on Paxos. EPaxos achieves three goals: (1) optimal commit latency in the wide-area when tolerating one and two failures, under realistic conditions; (2) uniform load balancing across all replicas (thus achieving high throughput); and (3) graceful performance degradation when replicas are slow or crash.

Egalitarian Paxos is to our knowledge the first protocol to achieve the previously stated goals efficiently---that is, requiring only a simple majority of replicas to be non-faulty, using a number of messages linear in the number of replicas to choose a command, and committing commands after just one communication round (one round trip) in the common case or after at most two rounds in any case. We prove Egalitarian Paxos's properties theoretically and demonstrate its advantages empirically through an implementation running on Amazon EC2.

Skip Supplemental Material Section

Supplemental Material

d2-10-lulian-moraru.mp4

mp4

1,015.8 MB

References

  1. M. K. Aguilera, C. Delporte-Gallet, H. Fauconnier, and S. Toueg. Thrifty generic broadcast. In Proc. 14th International Conference on Distributed Computing, DISC '00, pages 268--282, London, UK, UK, 2000. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Baker, C. Bond, J. C. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M. Leon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In Proc. of the Conference on Innovative Data system Research (CIDR), pages 223--234, 2011.Google ScholarGoogle Scholar
  3. M. Biely, Z. Milosevic, N. Santos, and A. Schiper. Spaxos: Offloading the leader for high throughput state machine replication. In Reliable Distributed Systems (SRDS), 2012 IEEE 31st Symposium on, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Burrows. The Chubby lock service for loosely-coupled distributed systems. In Proc. 7th USENIX OSDI, Seattle, WA, Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. J. Camargos, R. M. Schmidt, and F. Pedone. Multicoordinated paxos. In Proc. 26th annual ACM symposium on Principles of distributed computing, PODC '07, pages 316--317, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43: 225--267, Mar. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: an engineering perspective. In Proc. 26th ACM SOSP, PODC '07, pages 398--407, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google's globally-distributed database. In Proc. 10th USENIX OSDI. USENIX, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374--382, Apr. 1985. ISSN 0004-5411. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Google AppEngine. High replication datastore, 2012. https://developers.google.com/appengine/docs/java/datastore/overview.Google ScholarGoogle Scholar
  11. M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12(3), July 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: wait-free coordination for internet-scale systems. In Proc. USENIX ATC, USENIXATC'10, Berkeley, CA, USA, 2010. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Kaptritsos, Y. Wang, V. Quema, A. Clement, L. Alvisi, and M. Dahlin. Eve: Execute-verify replication for multi-core servers. In Proc. 10th USENIX OSDI, Hollywood, CA, Oct. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Kraska, G. Pang, M. J. Franklin, S. Madden, and A. Fekete. MDCC: Multi-data center consistency. In Proc. 8th ACM European Conference on Computer Systems (EuroSys), Apr. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16(2):133--169, 1998. ISSN 0734-2071. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Lamport. Paxos made simple. ACM SIGACT News, 32 (4), Dec. 2001.Google ScholarGoogle Scholar
  17. L. Lamport. Generalized consensus and Paxos. http://research.microsoft.com/apps/pubs/default.aspx?id=64631, 2005.Google ScholarGoogle Scholar
  18. L. Lamport. Fast Paxos. http://research.microsoft.com/apps/pubs/default.aspx?id=64624, 2006.Google ScholarGoogle Scholar
  19. L. Lamport, D. Malkhi, and L. Zhou. Vertical Paxos and primary-backup replication. Technical report, Microsoft Research, 2009.Google ScholarGoogle Scholar
  20. L. Lamport, D. Malkhi, and L. Zhou. Reconfiguring a state machine. SIGACT News, 41(1), Mar. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Liskov and J. Cowling. Viewstamped replication revisited. Technical Report MIT-CSAIL-TR-2012-021, MIT Computer Science and Artificial Intelligence Laboratory, 2012.Google ScholarGoogle Scholar
  22. J. MacCormick, N. Murphy, M. Najork, C. A. Thekkath, and L. Zhou. Boxwood: abstractions as the foundation for storage infrastructure. In Proc. 6th USENIX OSDI, San Francisco, CA, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Mao, F. P. Junqueira, and K. Marzullo. Mencius: building efficient replicated state machines for WANs. In Proc. 8th USENIX OSDI, pages 369--384, San Diego, CA, Dec. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. Moraru, D. G. Andersen, and M. Kaminsky. Epaxos code base. https://github.com/efficient/epaxos, Aug. 2013.Google ScholarGoogle Scholar
  25. I. Moraru, D. G. Andersen, and M. Kaminsky. A proof of correctness for Egalitarian Paxos. Technical report, Parallel Data Laboratory, Carnegie Mellon University, Aug. 2013. http://www.pdl.cmu.edu/PDL-FTP/associated/CMU-PDL-13-111.pdf.Google ScholarGoogle Scholar
  26. F. Pedone and A. Schiper. Handling message semantics with generic broadcast protocols. Distributed Computing, 15:97--107, Apr. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. F. Pedone and A. Schiper. Optimistic atomic broadcast: a pragmatic viewpoint. Theoretical Computer Science, 291: 79--101, Jan. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. tpc-c. TPC benchmark C. http://www.tpc.org/tpcc/spec/tpcc_current.pdf, 2010.Google ScholarGoogle Scholar
  29. P. Zieliński. Optimistic generic broadcast. In Proc. 19th International Symposium on Distributed Computing (DISC), pages 369--383, Kraków, Poland, Sept. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
    November 2013
    498 pages
    ISBN:9781450323888
    DOI:10.1145/2517349

    Copyright © 2013 Owner/Author

    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 3 November 2013

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate131of716submissions,18%

    Upcoming Conference

    SOSP '24

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader