Abstract
Wait-free implementations of shared objects tolerate the failure of processes, but not the failure of base objects from which they are implemented. We consider the problem of implementing shared objects that tolerate the failure of both processes and base objects.
We identify two classes of object failures: responsive and nonresponsive. With responsive failures, a faulty object responds to every operation, but its responses may be incorrect. With nonresponsive failures, a faulty object may also “hang” without responding. In each class, we define crash, omission, and arbitrary modes of failure.
We show that all responsive failure modes can be tolerated. More precisely, for all responsive failure modes ℱ, object types T, and t ≥ 0, we show how to implement a shared object of type T which is t-tolerant for ℱ. Such an object remains correct and wait-free even if up to t base objects fail according to ℱ. In contrast to responsive failures, we show that even the most benign non-responsive failure mode cannot be tolerated. We also show that randomization can be used to circumvent this impossibility result.
Graceful degradation is a desirable property of fault-tolerant implementations: the implemented object never fails more severely than the base objects it is derived from, even if all the base objects fail. For several failure modes, we show wheter this property can be achieved, and, if so, how.
- AFEK, Y., GREENBERG, D., MERRITT, M., AND TAUBENFELD, G. 1992. Computing with faulty shared memory. In Proceedings of the 11th Annual ACM Symposium on Principles of Distributed Computing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 47-58. Google Scholar
- AFEK, Y., GREENBERG, D., MERRITT, M., AND TAUBENFELD, G. 1995. Computing with faulty shared objects. J. ACM 42, 6 (Nov.), 1231-1274. Google ScholarDigital Library
- ASPNES, J. 1990. Time- and space-efficient randomized consensus. In Proceedings of the 9th Annual ACM Symposium on Principles of Distributed Computing (Quebec City, Que., Canada, Aug. 22-24). ACM, New York, pp. 325-332. Google Scholar
- BERMAN, P., GARAY, J., AND PERRY, K. J. 1989. Towards optimal distributed consensus. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science. IEEE, New York, pp. 410-415.Google Scholar
- BLOOM, B. 1987. Constructing two-writer atomic registers. In Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 249-259. Google Scholar
- BURNS, J. E., AND PETERSON, G.L. 1987. Constructing multi-reader atomic values from nonatomic values. In Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 222-231. Google Scholar
- COAN, B.A. 1987. Achieving consensus in fault-tolerant distributed computer systems: Protocols, lower bounds, and simulations. Ph.D dissertation. Massachusetts Institute of Technology, Cambridge, Mass., June. Google Scholar
- COAN, g. A., AND WELCH, J.L. 1992. Modular construction of a Byzantine agreement protocol with optimal message bit complexity. Inf. Comput. 97, 1, 61-85. Google ScholarCross Ref
- COURTOIS, P. J., HEYMANS, R., AND PARNAS, D.L. 1971. Concurrent control with "readers" and "writers." Commun. ACM 14, 10 (Oct.), 667-668. Google ScholarDigital Library
- DOLEV, D., DWORK, C., AND STOCKMEYER, L. 1987.On the minimal synchronism needed for distributed consensus. J. ACM 34, 1 (Jan.), 77-97. Google ScholarDigital Library
- DOLEV, D., REISCHUK, R., AND STRONG, H.R. 1990. Early stopping in Byzantine agreement. J. ACM 37, 4 (Oct.), 720-741. Google ScholarDigital Library
- FISCHER, M., LYNCH, N., AND MERRITT, M. 1986. Easy impossibility proofs for distributed consensus problems. Distr. Comput. 1, 26-39. Google ScholarDigital Library
- FISCHER, M. J., LYNCH, N. A., AND PATERSON, M.S. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (Apr.), 374-382. Google ScholarDigital Library
- HALDAR, S., AND VIDYASANKAR, K. 1991. A simple construction of 1-writer multi-reader multivalued atomic variable from regular variables. Tech Rep. No. 9108. Department of Computer Science, Memorial University of Newfoundland, St. John's, NF, Canada.Google Scholar
- HERLIHY, M. P. 1988. Impossibility and universality results for wait-free synchonization. In Proceedings of the 7th Annual ACM Symposium on Principles of Distributed Computing (Toronto, Ont., Canada, Aug. 15-17). ACM, New York, pp. 276-290. Google Scholar
- HERLIHY, M. P. 1991a. Randomized wait-free concurrent objects. In Proceedings of the lOth Annual ACM Symposium on Principles of Distributed Computing (Montreal, Que., Canada, Aug. 19-21). ACM, New York, pp. 11-21. Google Scholar
- HERLIHY, M. P. 1991b. Wait-free synchronization. ACM Trans. Prog. Lang. Syst. 13, 1 (Jan.), 124-149. Google ScholarDigital Library
- HERLIHY, M. P., AND WING, J.M. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Prog. Lang. Syst. 12, 3 (July), 463-492. Google ScholarDigital Library
- JAYANTI, P., CHANDRA, T. D., AND TOUEG, S. 1996. Fault-tolerant wait-free shared objects. Tech. Rep. TR 96-1565. Department of Computer Science, Cornell Univ., Ithaca, New York, January. Google Scholar
- JAYANTI, P., AND TOUEG, S. 1992. Some results on the impossibility, universality, and decidability of consensus. In Proceedings of the 6th Workshop on Distributed Algorithms (Haifa, Israel, Nov.). Lecture Notes in Computer Science, vol. 647. Springer-Verlag, New York. Google Scholar
- KLEINBERG, J., AND MULLAINATHAN, S. 1993. Resource bounds and combinations of consensus objects. In Proceedings of the 12th Annual ACM Symposium on Principles of Distributed Computing (Ithaca, N.Y., Aug. 15-18). ACM, New York, pp. 133-144. Google Scholar
- LAMPORT, L. 1977. Concurrent reading and writing. Commun. ACM 20, 11 (Nov.), 806-811. Google ScholarDigital Library
- LAMPORT, L. 1978. The implementation of reliable distributed multi-process systems. Comput. Netw. 2, 95-114.Google Scholar
- LAMPORT, L. 1986.On interprocess communication, Parts I and II. Distr. Comput. 1, 77-101.Google ScholarCross Ref
- LAMPORT, L., SHOSTAK, R., AND PEASE, M. 1982. The Byzantine generals problem. ACM Trans. Prog. Lang. Syst. 4, 3 (July), 382-401. Google ScholarDigital Library
- LouI, M. C., AND ABU-AMARA. 1987. Memory requirements for agreement among unreliable asynchonous processes. Adv. Comput. Res. 4, 163-183.Google Scholar
- LYNCH, N., AND TUTTLE, M. 1988. An introduction to input/output automata. Tech. Rep. MIT/ LCS/TM-373. Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.Google Scholar
- NEWMAN-WOLFE, R. 1987. A protocol for wait-free, atomic, multi-reader shared variables. In Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 232-248. Google Scholar
- PEASE, M., SHOSTAK, R., AND LAMPORT, L. 1980. Reaching agreement in the presence of faults. J. ACM 27, 2 (Apr.), 228-234. Google ScholarDigital Library
- PETERSON, G.L. 1983. A new solution to Lamport's concurrent programming problem using small shared variables. ACM Trans. Prog. Lang. Syst. 5, 1 (Jan.), 56-65. Google ScholarDigital Library
- PETERSON, G., AND BURNS, J. 1987. Concurrent reading while writing II: The multi-writer case. In Proceedings of the 28th Annual Symposium on Foundations of Computer Science. IEEE, New York.Google Scholar
- PLOTKIN, S.A. 1989. Sticky bits and universality of consensus. In Proceedings of the 8th Annual ACM Symposium on Principles of Distributed Computing (Edmonton, Alb., Canada, Aug. 14-16). ACM, New York, pp. 159-175. Google Scholar
- SCHAFFER, R. 1988. On the correctness of atomic multi-writer registers. Tech. Rep. TR No.: MIT/LCS/TM-364. Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.Google Scholar
- SCHNEIDER, F.B. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22, 4 (Dec.), 299-319. Google ScholarDigital Library
- SINGH, A. K., ANDERSON, J. H., AND GOUDA, M.G. 1987. The elusive atomic register, revisited. In Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 206-221. Google Scholar
- SRIKANTH, T. K., AND TOUEG, S. 1987. Simulating authenticated broadcasts to derive simple fault-tolerant algorithms. Dist. Comput. 2, 2, 80-94.Google ScholarDigital Library
- VIDYASANKAR, K. 1988. Converting Lamport's regular register to atomic register. IPL 28, 287-290. Google ScholarDigital Library
- VIDYASANKAR, K. 1989. An elegant 1-writer multireader multivalued atomic register. IPL 30, 221-223. Google ScholarDigital Library
- VITANYI, P., AND AWERBUCH, B. 1986. Atomic shared register access by asynchonous hardware. In Proceedings of the 27th Annual Symposium on Foundations of Computer Science. IEEE, New York.Google Scholar
Index Terms
- Fault-tolerant wait-free shared objects
Recommendations
Fault-tolerant wait-free shared objects
SFCS '92: Proceedings of the 33rd Annual Symposium on Foundations of Computer ScienceThe authors classify object failures into two broad categories: responsive and non-responsive. They require that wait-free objects subject to responsive failures continue to respond (in finite time) to operation invocations. The responses may be ...
Robust wait-free hierarchies
The problem of implementing a shared object of one type from shared objects of other types has been extensively researched. Recent focus has mostly been on wait-free implementations, which permit every process to complete its operations on implemented ...
Comments