ABSTRACT
We study the feasibility and cost of implementing Ω---a fundamental failure detector at the core of many algorithms---in systems with weak reliability and synchrony assumptions. Intuitively, Ω allows processes to eventually elect a common leader. We first give an algorithm that implements Ω in a weak system S where processes are synchronous, but: (a) any number of them may crash, and (b) only the output links of an unknown correct process are eventually timely (all other links can be asynchronous and/or lossy). This is in contrast to previous implementations of Ω which assume that a quadratic number of links are eventually timely, or systems that are strong enough to implement the eventually perfect failure detector P. We next show that implementing Ω in S is expensive: even if we want an implementation that tolerates just one process crash, all correct processes (except possibly one) must send messages forever; moreover, a quadratic number of links must carry messages forever. We then show that with a small additional assumption---the existence of some unknown correct process whose asynchronous links are lossy but fair---we can implement Ω efficiently: we give an algorithm for Ω such that eventually only one process (the elected leader) sends messages.
- M. Aguilera, C. Delporte-Gallet, H. Fauconnier, and S. Toueg. Stable leader election (extended abstract). In Proceedings of the 15th International Symposium on Distributed Computing, LNCS 2180, pages 108--122. Springer-Verlag, 2001.]] Google ScholarDigital Library
- M. K. Aguilera, C. Delporte-Gallet, H. Fanconnier, and S. Toueg. On implementing Omega with weak reliability and synchrony assumptions. Technical Report 2003-7, LIAFA, Université Paris 7-Denis Diderot (France), 2003.]]Google ScholarDigital Library
- M. Bertier, O. Marin, and P. Sens. Implementation and performance evaluation of an adaptable failure detector. In Proceedings of the 2002 International Conference on Dependable Systems and Networks, June 2002.]] Google ScholarDigital Library
- M. Castro and B. Liskov. Practical byzantine fault tolerance and proactive recovery. ACM Transactions on Computer Systems (TOCS), 20(4):398--461, Nov. 2002.]] Google ScholarDigital Library
- T. D. Chandra, V. Hadzilacos, and S. Toueg. The weakest failure detector for solving consensus. J. ACM, 43(4):685--722, July 1996.]] Google ScholarDigital Library
- T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. J. ACM, 43(2):225--267, Mar. 1996.]] Google ScholarDigital Library
- W. Chen, S. Toueg, and M. K. Aguilera. On the quality of service of failure detectors. IEEE Transactions on computers, 1(51):13--32, Jan. 2002.]] Google ScholarDigital Library
- F. Chu. Reducing Ω to W. Information Processing Letters, 67(6):298--293, Sept. 1998.]] Google ScholarDigital Library
- R. D. Prisco, B. Lampson, and N. A. Lynch. Revisiting the Paxos algorithm. In Proceedings of the 11th Workshop on Distributed Algorithms, LNCS 1320, pages 11--125. Springer-Verlag, Sept. 1997.]] Google ScholarDigital Library
- B. Deianov and S. Toueg. Failure detector service for dependable computing. In Proceedings of the 2000 International Conference on Dependable Systems and Networks (ICDSN/FTCS-30), pages B14--B15. IEEE computer society press, June 2000.]]Google Scholar
- C. Dwork, N. A. Lynch, and L. Stockmeyer. Consensus in the presence of partial synchrony. J. ACM, 35(2):288--323, Apr. 1988.]] Google ScholarDigital Library
- C. Fetzer, M. Raynal, and F. Tronel. A failure detection protocol based on a lazy approach. Research Report 1367, IRISA, Nov. 2000.]]Google Scholar
- E. Gafni and L. Lamport. Disk paxos. In Proceedings of the 14th International Symposium on Distributed Computing, LNCS 1914, pages 330--344. Springer-Verlag, 2000.]] Google ScholarDigital Library
- R. Guerraoui and P. Dutta. Fast indulgent consensus with zero degradation. In Proceedings of the 4th European Dependable Computing Conference, Oct. 2002.]] Google ScholarDigital Library
- V. Hadzilacos, 2002. Comparison between S and P, personal communication.]]Google Scholar
- I. Keidar and S. Rajsbaum. On the cost of fault-tolerant consensus when there are no faults-a tutorial. In Tutorial 21th ACM Symposium on Principles of Distributed Computing, July 2002.]]Google Scholar
- L. Lamport. The Part-Time parliament. ACM Transactions on Computer Systems, 16(2):133--169, May 1998.]] Google ScholarDigital Library
- L. Lamport. Paxos made simple. SIGACT News, 32(4):18--25, Dec. 2001.]]Google Scholar
- M. Larrea, S. Arévalo, and A. Fernández. Efficient algorithms to implement unreliable failure detectors in partially synchronous systems. In Proceedings of the 13th International Symposium on Distributed Algorithms, LNCS 1693, pages 34--48, Sept. 1999.]] Google ScholarDigital Library
- M. Larrea, A. Fernández, and S. Arévalo. Optimal implementation of the weakest failure detector for solving consensus. In Proceedings of the 19th Symposium on Reliable Distributed Systems, pages 52--59. IEEE Computer Society Press, Oct. 2000.]] Google ScholarDigital Library
- M. Larrea, A. Fernandez, and S. Arevalo. Eventually consistent failure detectors. In ACM Symposium on Parallel Algorithms and Architectures, pages 326--327, 2001.]] Google ScholarDigital Library
- A. Mostéfaoui and M. Raynal. Leader-based consensus. Parallel Processing Letters, 11(1) :95--107, 2001.]]Google ScholarCross Ref
- R. van Renesse, Y. Minsky, and M. M. Hayden. A gossip-based failure detection service. In Proceedings of Middleware'98, Sept. 1998.]]Google ScholarCross Ref
Index Terms
- On implementing omega with weak reliability and synchrony assumptions
Recommendations
On implementing omega in systems with weak reliability and synchrony assumptions
We study the feasibility and cost of implementing Ω--a fundamental failure detector at the core of many algorithms--in systems with weak reliability and synchrony assumptions. Intuitively, Ω allows processes to eventually elect a common leader. We first ...
Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony
DSN '06: Proceedings of the International Conference on Dependable Systems and NetworksThis paper considers the eventual leader election problem in asynchronous message-passing systems where an arbitrary number t of processes can crash (t \lt n, where n is the total number of processes). It considers weak assumptions both on the initial ...
Strong and weak virtual synchrony in Horus
SRDS '96: Proceedings of the 15th Symposium on Reliable Distributed SystemsThis paper presents two variants of virtual synchrony, which are supported by Horus. The first variant, called strong virtual synchrony, includes the property that every message is delivered within the view in which it is sent. This property is very ...
Comments