skip to main content
tutorial

Paxos Made Moderately Complex

Published:17 February 2015Publication History
Skip Abstract Section

Abstract

This article explains the full reconfigurable multidecree Paxos (or multi-Paxos) protocol. Paxos is by no means a simple protocol, even though it is based on relatively simple invariants. We provide pseudocode and explain it guided by invariants. We initially avoid optimizations that complicate comprehension. Next we discuss liveness, list various optimizations that make the protocol practical, and present variants of the protocol.

References

  1. Peter Alvaro, Tyson Condie, Neil Conway, Joseph M. Hellerstein, and Russell Sears. 2010. I do declare: Consensus in a logic language. SIGOPS Operating Systems Review 43, 4, 25--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Hagit Attiya, Amotz Bar-Noy, and Danny Dolev. 1995. Sharing memory robustly in message-passing systems. Journal of the ACM 42, 1, 124--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Romain Boichat, Partha Dutta, Svend Frølund, and Rachid Guerraoui. 2003. Deconstructing Paxos. SIGACT News 34, 1, 47--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters, and Peng Li. 2011. Paxos replicated state machines as the basis of a high-performance data store. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI'11). 141--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Navin Budhiraja, Keith Marzullo, Fred B. Schneider, and Sam Toueg. 1993. The primary-backup approach. In Distributed Systems (2nd ed.), S. Mullender (Ed.). ACM Press/Addison-Wesley, New York, NY, 199--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Mike Burrows. 2006. The chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI'06). 335--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tushar Deepak Chandra and Sam Toueg. 1991. Unreliable failure detectors for asynchronous systems (preliminary version). In Proceedings of the 10th Annual ACM Symposium on Principles of Distributed Computing (PODC'91). ACM, New York, NY, 325--340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Roberto De Prisco, Butler W. Lampson, and Nancy Lynch. 2000. Revisiting the Paxos algorithm. Theoretical Computer Science 243, 1--2, 35--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. 1985. Impossibility of distributed consensus with one faulty process. Journal of the ACM 32, 2, 374--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Eli Gafni and Leslie Lamport. 2003. Disk Paxos. Distributed Computing 16, 1, 1--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cary G. Gray and David R. Cheriton. 1989. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (SOSP'89). ACM, New York, NY, 202--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jonathan Kirsch and Yair Amir. 2008a. Paxos for System Builders. Technical Report CNDS-2008-2. Johns Hopkins University, Baltimore, MD.Google ScholarGoogle Scholar
  13. Jonathan Kirsch and Yair Amir. 2008b. Paxos for system builders: An overview. In Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware (LADIS'08). ACM, New York, NY, Article No. 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Leslie Lamport. 1978. Time, clocks, and the ordering of events in a distributed system. Communations of the ACM 21, 7, 558--565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Leslie Lamport. 1998. The part-time parliament. ACM Transactions on Computer Systems 16, 2, 133--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Leslie Lamport. 2001. Paxos made simple. ACM SIGACT News 32, 4, 51--58.Google ScholarGoogle Scholar
  17. Leslie Lamport. 2005. Generalized Consensus and Paxos. Technical Report MSR-TR-2005-33. Microsoft Research, Mountain View, CA. Available at http://research.microsoft.com/pubs/64631/tr-2005-33.pdfGoogle ScholarGoogle Scholar
  18. Leslie Lamport. 2006. Fast Paxos. Distributed Computing 19, 2, 79--103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Leslie Lamport, Dahlia Malkhi, and Lidong Zhou. 2008. Stoppable Paxos. Technical Report. Microsoft Research, Mountain View, CA. Available at http://research.microsoft.com/apps/pubs/default.aspx?id=101826Google ScholarGoogle Scholar
  20. Leslie Lamport, Dahlia Malkhi, and Lidong Zhou. 2009. Vertical Paxos and primary-backup replication. In Proceedings of the 28th ACM Symposium on Principles of Distributed Computing (PODC'09). ACM, New York, NY, 312--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Leslie Lamport and Mike Massa. 2004. Cheap Paxos. In Proceedings of the 2004 International Conference on Dependable Systems and Networks (DSN'04). IEEE, Los Alamitos, CA, 307--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Butler W. Lampson. 1996. How to build a highly available system using consensus. In Proceedings of the 10th International Workshop on Distributed Algorithms (WDAG'96). 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Butler W. Lampson. 2001. The ABCD's of Paxos. In Proceedings of the 20th Annual ACM Symposium on Principles of Distributed Computing (PODC'01). ACM, New York, NY, 13--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Harry C. Li, Allen Clement, Amitanand S. Aiyer, and Lorenzo Alvisi. 2007. The Paxos register. In Proceedings of the 26th IEEE International Symposium on Reliable Distributed Systems (SRDS'07). IEEE, Los Alamitos, CA, 114--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jacob R. Lorch, Atul Adya, William J. Bolosky, Ronnie Chaiken, John R. Douceur, and Jon Howell. 2006. The SMART way to migrate replicated stateful services. In Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006 (EuroSys'06). ACM, New York, NY, 103--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yanhua Mao, Flavio P. Junqueira, and Keith Marzullo. 2008. Mencius: Building efficient replicated state machines for WANs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI'08). 369--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. David Mazières. 2007. Paxos Made Practical. Technical Report. Stanford University, Stanford, CA. Available at http://www.scs.stanford.edu/∼dm/home/papers/paxos.pdfGoogle ScholarGoogle Scholar
  28. Hein Meling and Leander Jehl. 2013. Tutorial summary: Paxos explained from scratch. In Principles of Distributed Systems. Lecture Notes in Computer Science, Vol. 8304. Springer, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. 1992. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems 17, 1, 94--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Iulian Moraru, David G. Andersen, and Michael Kaminsky. 2013. There is more consensus in egalitarian parliaments. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP'13). ACM, New York, NY, 358--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Brian M. Oki and Barbara H. Liskov. 1988. Viewstamped replication: A new primary copy method to support highly-available distributed systems. In Proceedings of the 7th Annual ACM Symposium on Principles of Distributed Computing (PODC'88). ACM, New York, NY, 8--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Diego Ongaro and John Ousterhout. 2014. In search of an understandable consensus algorithm. In Proceedings of the 2014 USENIX Annual Technical Conference (ATC'14). 305--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Richard D. Schlichting and Fred B. Schneider. 1983. Fail-stop processors: An approach to designing fault-tolerant computing systems. ACM Transactions on Computer Systems 1, 3, 222--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Fred B. Schneider. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys 22, 4, 299--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Robert H. Thomas. 1979. A majority consensus approach to concurrency control for multiple copy databases. ACM Transactions on Database Systems 4, 2, 180--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Robbert van Renesse, Nicolas Schiper, and Fred B. Schneider. 2014. Vive la différence: Paxos vs. Viewstamped Replication vs. Zab. IEEE Transactions on Dependable and Secure Computing PP, 99, 1.Google ScholarGoogle Scholar
  37. Robbert van Renesse, Fred B. Schneider, and Johannes Gehrke. 2011. Nerio: Leader Election and Edict Ordering. Technical Report. Cornell University, Ithaca, NY. Available at http://hdl.handle.net/1813/23631Google ScholarGoogle Scholar

Index Terms

  1. Paxos Made Moderately Complex

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Computing Surveys
        ACM Computing Surveys  Volume 47, Issue 3
        April 2015
        602 pages
        ISSN:0360-0300
        EISSN:1557-7341
        DOI:10.1145/2737799
        • Editor:
        • Sartaj Sahni
        Issue’s Table of Contents

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 February 2015
        • Accepted: 1 September 2014
        • Revised: 1 June 2014
        • Received: 1 August 2013
        Published in csur Volume 47, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • tutorial
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader