research-article

The Next 700 BFT Protocols

Authors:
Pierre-Louis Aublin

INSA Lyon, Villeurbanne, France

INSA Lyon, Villeurbanne, France
View Profile

,
Rachid Guerraoui

EPFL, Lausanne, Switzerland

EPFL, Lausanne, Switzerland
View Profile

,
Nikola Knežević

IBM Research - Zurich, Rüschlikon, Switzerland

IBM Research - Zurich, Rüschlikon, Switzerland
View Profile

,
Vivien Quéma

Grenoble INP, d'Hères, France

Grenoble INP, d'Hères, France
View Profile

,
Marko Vukolić

Eurécom, Rüschlikon, Switzerland

Eurécom, Rüschlikon, Switzerland
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 32 Issue 4Article No.: 12pp 1–45https://doi.org/10.1145/2658994

Published:20 January 2015Publication History

ACM Transactions on Computer Systems

Abstract

We present Abstract (ABortable STate mAChine replicaTion), a new abstraction for designing and reconfiguring generalized replicated state machines that are, unlike traditional state machines, allowed to abort executing a client’s request if “something goes wrong.”

Abstract can be used to considerably simplify the incremental development of efficient Byzantine fault-tolerant state machine replication (BFT) protocols that are notorious for being difficult to develop. In short, we treat a BFT protocol as a composition of Abstract instances. Each instance is developed and analyzed independently and optimized for specific system conditions. We illustrate the power of Abstract through several interesting examples.

We first show how Abstract can yield benefits of a state-of-the-art BFT protocol in a less painful and error-prone manner. Namely, we develop AZyzzyva, a new protocol that mimics the celebrated best-case behavior of Zyzzyva using less than 35% of the Zyzzyva code. To cover worst-case situations, our abstraction enables one to use in AZyzzyva any existing BFT protocol.

We then present Aliph, a new BFT protocol that outperforms previous BFT protocols in terms of both latency (by up to 360%) and throughput (by up to 30%). Finally, we present R-Aliph, an implementation of Aliph that is robust, that is, whose performance degrades gracefully in the presence of Byzantine replicas and Byzantine clients.

References

Michael Abd-El-Malek, Gregory R. Ganger, Garth R. Goodson, Michael K. Reiter, and Jay J. Wylie. 2005. Fault-scalable Byzantine fault-tolerant services. In Proceedings of the Symposium on Operating Systems Principles (SOSP’05). ACM. Google ScholarDigital Library
Marcos K. Aguilera, Svend Frolund, Vassos Hadzilacos, Stephanie L. Horn, and Sam Toueg. 2007. Abortable and query-abortable objects and their efficient implementation. In Proceedings of the ACM Symposium on Principles of Distributed computing (PODC’07). Google ScholarDigital Library
Yair Amir, Brian A. Coan, Jonathan Kirsch, and John Lane. 2011. Prime: Byzantine replication under attack. IEEE Trans. Dependable Sec. Comput. 8, 4 (2011), 564--577. Google ScholarDigital Library
Hagit Attiya, Rachid Guerraoui, and Petr Kouznetsov. 2005. Computing with reads and writes in the absence of step contention. In Proceedings of the International Conference on Distributed Computing (DISC’05). Google ScholarDigital Library
Ken Birman, Dahlia Malkhi, and Robbert Van Renesse. 2010. Virtually Synchronous Methodology for Dynamic Service Replication. Technical Report MSR-TR-2010-151.Google Scholar
Romain Boichat, Partha Dutta, Svend Frölund, and Rachid Guerraoui. 2003. Deconstructing Paxos. SIGACT News Distrib. Comput. 34, 1 (2003), 47--67. DOI:http://dx.doi.org/10.1145/637437.637447 Google ScholarDigital Library
Francisco V. Brasileiro, Fabíola Greve, Achour Mostéfaoui, and Michel Raynal. 2001. Consensus in one communication step. In Proceedings of the International Conference on Parallel Computing Technologies (PaCT’01). Google ScholarDigital Library
Miguel Castro and Barbara Liskov. 2002. Practical Byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20, 4 (Nov. 2002), 398--461. DOI:http://dx.doi.org/10.1145/571637.571640 Google ScholarDigital Library
Miguel Castro, Rodrigo Rodrigues, and Barbara Liskov. 2003. BASE: Using abstraction to improve fault tolerance. ACM Trans. Comput. Syst. 21, 3 (Aug. 2003), 236--269. DOI:http://dx.doi.org/10.1145/859716.859718 Google ScholarDigital Library
Tushar D. Chandra, Robert Griesemer, and Joshua Redstone. 2007. Paxos made live: An engineering perspective. In Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC’07). ACM. DOI:http://dx.doi.org/10.1145/1281100.1281103 Google ScholarDigital Library
Wei Chen. 2007. Abortable Consensus and Its Application to Probabilistic Atomic Broadcast. Technical Report MSR-TR-2006-135.Google Scholar
Allen Clement, Edmund Wong, Lorenzo Alvisi, Mike Dahlin, and Mirco Marchetti. 2009. Making Byzantine fault tolerant systems tolerate Byzantine faults. In Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI’09). Google ScholarDigital Library
James Cowling, Daniel Myers, Barbara Liskov, Rodrigo Rodrigues, and Liuba Shrira. 2006. HQ replication: A hybrid quorum protocol for Byzantine fault tolerance. In Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI’06). USENIX Association. http://portal.acm.org/citation.cfm&quest;id=1298455.1298473. Google ScholarDigital Library
Dan Dobre and Neeraj Suri. 2006. One-step consensus with zero-degradation. In Proceedings of the 2004 International Conference on Dependable Systems and Networks (DSN’06). Google ScholarDigital Library
Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. 1988. Consensus in the presence of partial synchrony. J. ACM 35, 2 (April 1988), 36. DOI:http://dx.doi.org/10.1145/42282.42283 Google ScholarDigital Library
Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (April 1985), 374--382. Google ScholarDigital Library
Miguel Garcia, Alysson Bessani, Ilir Gashi, Nuno Neves, and Rafael Obelheiro. 2011. OS diversity for intrusion tolerance: Myth or reality&quest; In Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems&Networks (DSN’’11). IEEE Computer Society, Washington, DC, 383--394. DOI:http://dx.doi.org/10.1109/DSN.2011.5958251 Google ScholarDigital Library
Ilir Gashi, Peter T. Popov, and Lorenzo Strigini. 2007. Fault tolerance via diversity for off-the-shelf products: A study with SQL database servers. IEEE Trans. Dependable Sec. Comput. 4, 4 (2007), 280--294. Google ScholarDigital Library
Jim Gray. 1978. Notes on data base operating systems. In Operating Systems—An Advanced Course. Springer-Verlag, 393--481. http://dl.acm.org/citation.cfm&quest;id=647433.723863 Google ScholarDigital Library
Rachid Guerraoui, Nikola Knežević, Vivien Quéma, and Marko Vukolić. 2008. The Next 700 BFT Protocols. Technical Report LPD-REPORT-2008-008. EPFL.Google Scholar
Rachid Guerraoui, Nikola Knežević, Vivien Quéma, and Marko Vukolić. 2010. The next 700 BFT protocols. In Proceedings of the ACM European Conference on Computer systems (EuroSys’10). Google ScholarDigital Library
James Hendricks, Gregory R. Ganger, and Michael K. Reiter. 2007. Low-overhead byzantine fault-tolerant storage. In Proceedings of the Symposium on Operating Systems Principles (SOSP’07). ACM. Google ScholarDigital Library
Maurice Herlihy and Jeannette M. Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 3 (1990), 463--492. Google ScholarDigital Library
Prasad Jayanti. 2003. Adaptive and efficient abortable mutual exclusion. In Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC’03). Google ScholarDigital Library
Rüdiger Kapitza, Johannes Behl, Christian Cachin, Tobias Distler, Simon Kuhnle, Seyed Vahid Mohammadi, Wolfgang Schröder-Preikschat, and Klaus Stengel. 2012. CheapBFT: Resource-efficient Byzantine fault tolerance. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys’12). ACM, New York, NY, 295--308. DOI:http://dx.doi.org/10.1145/2168836.2168866 Google ScholarDigital Library
Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund Wong. 2010. Zyzzyva: Speculative Byzantine fault tolerance. ACM Trans. Comput. Syst. 27, 4, Article 7 (Jan. 2010), 39 pages. DOI:http://dx.doi.org/10.1145/1658357.1658358 Google ScholarDigital Library
Leslie Lamport. 2003. Lower bounds for asynchronous consensus. In Proceedings of the International Workshop on Future Directions in Distributed Computing (FuDiCo’03).Google ScholarCross Ref
Leslie Lamport. 2009. The PlusCal algorithm language. In Proceedings of the 6th International Colloquium on Theoretical Aspects of Computing (ICTAC). 36--60. Google ScholarDigital Library
Leslie Lamport, Dahlia Malkhi, and Lidong Zhou. 2010. Reconfiguring a state machine. SIGACT News 41, 1 (2010), 63--73. Google ScholarDigital Library
Fernando Pedone. 2001. Boosting system performance with optimistic distributed protocols. Comput. J. 34, 12 (2001), 80--86. DOI:http://dx.doi.org/10.1109/2.970581 Google ScholarDigital Library
Fred B. Schneider. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22, 4 (Dec. 1990), 299--319. DOI:http://dx.doi.org/10.1145/98163.98167 Google ScholarDigital Library
Bianca Schroeder, Adam Wierman, and Mor Harchol-Balter. 2006. Open versus closed: A cautionary tale. In Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI). 18--18. Google ScholarDigital Library
Atul Singh, Tathagata Das, Petros Maniatis, Peter Druschel, and Timothy Roscoe. 2008. BFT protocols under fire. In Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI’08). USENIX Association. Google ScholarDigital Library
Sam Toueg. 1984. Randomized Byzantine agreements. In Proceedings of the 3rd Annual ACM Symposium on Principles of Distributed Computing. 163--178. Google ScholarDigital Library
Robbert van Renesse and Rachid Guerraoui. 2010. Replication techniques for availability. In Replication, B. Charron-Bost, F. Pedone, and A. Schiper (Eds.). Springer-Verlag, 19--40. http://dl.acm.org/citation.cfm&quest;id=2172338.2172340" Google ScholarDigital Library
Robbert van Renesse and Fred B. Schneider. 2004. Chain replication for supporting high throughput and availability. In Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI’04). Google ScholarDigital Library
Giuliana Santos Veronese, Miguel Correia, Alysson Neves Bessani, and Lau Cheuk Lung. 2009. Spin one’s wheels&quest; Byzantine fault tolerance with a spinning primary. In Proceedings of International Symposium on Reliable Distributed Systems (SRDS’09). IEEE Computer Society. DOI:http://dx.doi.org/10.1109/SRDS.2009.36 Google ScholarDigital Library
Giuliana Santos Veronese, Miguel Correia, Alysson Neves Bessani, Lau Cheuk Lung, and Paulo Veríssimo. 2013. Efficient byzantine fault-tolerance. IEEE Trans. Comput. 62, 1 (2013), 16--30. Google ScholarDigital Library

Index Terms

The Next 700 BFT Protocols
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles

Recommendations

The next 700 BFT protocols
EuroSys '10: Proceedings of the 5th European conference on Computer systems

Modern Byzantine fault-tolerant state machine replication (BFT) protocols involve about 20,000 lines of challenging C++ code encompassing synchronization, networking and cryptography. They are notoriously difficult to develop, test and prove. We present ...
Read More
BFT-Bench: A Framework to Evaluate BFT Protocols
ICPE '16: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering

Byzantine Fault Tolerance (BFT) has been extensively studied and numerous protocols and software prototypes have been proposed. However, most BFT prototypes have been evaluated in an ad-hoc setting, considering different fault types and fault injection ...
Read More
BFT-Bench: Towards a Practical Evaluation of Robustness and Effectiveness of BFT Protocols
Distributed Applications and Interoperable Systems
Abstract
Byzantine Fault Tolerance (BFT) is an interesting means to make computing systems resilient in presence of failures and attacks. That being said, designing and implementing BFT protocols is a hard and tedious task. This first comes from the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computer Systems Volume 32, Issue 4
January 2015
124 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/2723895
Editor:
Todd C. Mowry
Carnegie Mellon University, Pittsburgh, PA
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 January 2015
- Accepted: 1 July 2014
- Revised: 1 February 2014
- Received: 1 May 2012
Published in tocs Volume 32, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Abstract
Byzantine
composability
fault tolerance
optimization
robustness
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 89
  Total Citations
  View Citations
- 1,054
  Total Downloads
- Downloads (Last 12 months)68
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The Next 700 BFT Protocols

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

The next 700 BFT protocols

BFT-Bench: A Framework to Evaluate BFT Protocols

BFT-Bench: Towards a Practical Evaluation of Robustness and Effectiveness of BFT Protocols

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The Next 700 BFT Protocols

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

The next 700 BFT protocols

BFT-Bench: A Framework to Evaluate BFT Protocols

BFT-Bench: Towards a Practical Evaluation of Robustness and Effectiveness of BFT Protocols

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media