skip to main content
article
Free Access

Fault-tolerant wait-free shared objects

Published:01 May 1998Publication History
Skip Abstract Section

Abstract

Wait-free implementations of shared objects tolerate the failure of processes, but not the failure of base objects from which they are implemented. We consider the problem of implementing shared objects that tolerate the failure of both processes and base objects.

We identify two classes of object failures: responsive and nonresponsive. With responsive failures, a faulty object responds to every operation, but its responses may be incorrect. With nonresponsive failures, a faulty object may also “hang” without responding. In each class, we define crash, omission, and arbitrary modes of failure.

We show that all responsive failure modes can be tolerated. More precisely, for all responsive failure modes ℱ, object types T, and t ≥ 0, we show how to implement a shared object of type T which is t-tolerant for ℱ. Such an object remains correct and wait-free even if up to t base objects fail according to ℱ. In contrast to responsive failures, we show that even the most benign non-responsive failure mode cannot be tolerated. We also show that randomization can be used to circumvent this impossibility result.

Graceful degradation is a desirable property of fault-tolerant implementations: the implemented object never fails more severely than the base objects it is derived from, even if all the base objects fail. For several failure modes, we show wheter this property can be achieved, and, if so, how.

References

  1. AFEK, Y., GREENBERG, D., MERRITT, M., AND TAUBENFELD, G. 1992. Computing with faulty shared memory. In Proceedings of the 11th Annual ACM Symposium on Principles of Distributed Computing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 47-58. Google ScholarGoogle Scholar
  2. AFEK, Y., GREENBERG, D., MERRITT, M., AND TAUBENFELD, G. 1995. Computing with faulty shared objects. J. ACM 42, 6 (Nov.), 1231-1274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ASPNES, J. 1990. Time- and space-efficient randomized consensus. In Proceedings of the 9th Annual ACM Symposium on Principles of Distributed Computing (Quebec City, Que., Canada, Aug. 22-24). ACM, New York, pp. 325-332. Google ScholarGoogle Scholar
  4. BERMAN, P., GARAY, J., AND PERRY, K. J. 1989. Towards optimal distributed consensus. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science. IEEE, New York, pp. 410-415.Google ScholarGoogle Scholar
  5. BLOOM, B. 1987. Constructing two-writer atomic registers. In Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 249-259. Google ScholarGoogle Scholar
  6. BURNS, J. E., AND PETERSON, G.L. 1987. Constructing multi-reader atomic values from nonatomic values. In Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 222-231. Google ScholarGoogle Scholar
  7. COAN, B.A. 1987. Achieving consensus in fault-tolerant distributed computer systems: Protocols, lower bounds, and simulations. Ph.D dissertation. Massachusetts Institute of Technology, Cambridge, Mass., June. Google ScholarGoogle Scholar
  8. COAN, g. A., AND WELCH, J.L. 1992. Modular construction of a Byzantine agreement protocol with optimal message bit complexity. Inf. Comput. 97, 1, 61-85. Google ScholarGoogle ScholarCross RefCross Ref
  9. COURTOIS, P. J., HEYMANS, R., AND PARNAS, D.L. 1971. Concurrent control with "readers" and "writers." Commun. ACM 14, 10 (Oct.), 667-668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. DOLEV, D., DWORK, C., AND STOCKMEYER, L. 1987.On the minimal synchronism needed for distributed consensus. J. ACM 34, 1 (Jan.), 77-97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. DOLEV, D., REISCHUK, R., AND STRONG, H.R. 1990. Early stopping in Byzantine agreement. J. ACM 37, 4 (Oct.), 720-741. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. FISCHER, M., LYNCH, N., AND MERRITT, M. 1986. Easy impossibility proofs for distributed consensus problems. Distr. Comput. 1, 26-39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. FISCHER, M. J., LYNCH, N. A., AND PATERSON, M.S. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (Apr.), 374-382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. HALDAR, S., AND VIDYASANKAR, K. 1991. A simple construction of 1-writer multi-reader multivalued atomic variable from regular variables. Tech Rep. No. 9108. Department of Computer Science, Memorial University of Newfoundland, St. John's, NF, Canada.Google ScholarGoogle Scholar
  15. HERLIHY, M. P. 1988. Impossibility and universality results for wait-free synchonization. In Proceedings of the 7th Annual ACM Symposium on Principles of Distributed Computing (Toronto, Ont., Canada, Aug. 15-17). ACM, New York, pp. 276-290. Google ScholarGoogle Scholar
  16. HERLIHY, M. P. 1991a. Randomized wait-free concurrent objects. In Proceedings of the lOth Annual ACM Symposium on Principles of Distributed Computing (Montreal, Que., Canada, Aug. 19-21). ACM, New York, pp. 11-21. Google ScholarGoogle Scholar
  17. HERLIHY, M. P. 1991b. Wait-free synchronization. ACM Trans. Prog. Lang. Syst. 13, 1 (Jan.), 124-149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. HERLIHY, M. P., AND WING, J.M. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Prog. Lang. Syst. 12, 3 (July), 463-492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. JAYANTI, P., CHANDRA, T. D., AND TOUEG, S. 1996. Fault-tolerant wait-free shared objects. Tech. Rep. TR 96-1565. Department of Computer Science, Cornell Univ., Ithaca, New York, January. Google ScholarGoogle Scholar
  20. JAYANTI, P., AND TOUEG, S. 1992. Some results on the impossibility, universality, and decidability of consensus. In Proceedings of the 6th Workshop on Distributed Algorithms (Haifa, Israel, Nov.). Lecture Notes in Computer Science, vol. 647. Springer-Verlag, New York. Google ScholarGoogle Scholar
  21. KLEINBERG, J., AND MULLAINATHAN, S. 1993. Resource bounds and combinations of consensus objects. In Proceedings of the 12th Annual ACM Symposium on Principles of Distributed Computing (Ithaca, N.Y., Aug. 15-18). ACM, New York, pp. 133-144. Google ScholarGoogle Scholar
  22. LAMPORT, L. 1977. Concurrent reading and writing. Commun. ACM 20, 11 (Nov.), 806-811. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. LAMPORT, L. 1978. The implementation of reliable distributed multi-process systems. Comput. Netw. 2, 95-114.Google ScholarGoogle Scholar
  24. LAMPORT, L. 1986.On interprocess communication, Parts I and II. Distr. Comput. 1, 77-101.Google ScholarGoogle ScholarCross RefCross Ref
  25. LAMPORT, L., SHOSTAK, R., AND PEASE, M. 1982. The Byzantine generals problem. ACM Trans. Prog. Lang. Syst. 4, 3 (July), 382-401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. LouI, M. C., AND ABU-AMARA. 1987. Memory requirements for agreement among unreliable asynchonous processes. Adv. Comput. Res. 4, 163-183.Google ScholarGoogle Scholar
  27. LYNCH, N., AND TUTTLE, M. 1988. An introduction to input/output automata. Tech. Rep. MIT/ LCS/TM-373. Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.Google ScholarGoogle Scholar
  28. NEWMAN-WOLFE, R. 1987. A protocol for wait-free, atomic, multi-reader shared variables. In Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 232-248. Google ScholarGoogle Scholar
  29. PEASE, M., SHOSTAK, R., AND LAMPORT, L. 1980. Reaching agreement in the presence of faults. J. ACM 27, 2 (Apr.), 228-234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. PETERSON, G.L. 1983. A new solution to Lamport's concurrent programming problem using small shared variables. ACM Trans. Prog. Lang. Syst. 5, 1 (Jan.), 56-65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. PETERSON, G., AND BURNS, J. 1987. Concurrent reading while writing II: The multi-writer case. In Proceedings of the 28th Annual Symposium on Foundations of Computer Science. IEEE, New York.Google ScholarGoogle Scholar
  32. PLOTKIN, S.A. 1989. Sticky bits and universality of consensus. In Proceedings of the 8th Annual ACM Symposium on Principles of Distributed Computing (Edmonton, Alb., Canada, Aug. 14-16). ACM, New York, pp. 159-175. Google ScholarGoogle Scholar
  33. SCHAFFER, R. 1988. On the correctness of atomic multi-writer registers. Tech. Rep. TR No.: MIT/LCS/TM-364. Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.Google ScholarGoogle Scholar
  34. SCHNEIDER, F.B. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22, 4 (Dec.), 299-319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. SINGH, A. K., ANDERSON, J. H., AND GOUDA, M.G. 1987. The elusive atomic register, revisited. In Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 206-221. Google ScholarGoogle Scholar
  36. SRIKANTH, T. K., AND TOUEG, S. 1987. Simulating authenticated broadcasts to derive simple fault-tolerant algorithms. Dist. Comput. 2, 2, 80-94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. VIDYASANKAR, K. 1988. Converting Lamport's regular register to atomic register. IPL 28, 287-290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. VIDYASANKAR, K. 1989. An elegant 1-writer multireader multivalued atomic register. IPL 30, 221-223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. VITANYI, P., AND AWERBUCH, B. 1986. Atomic shared register access by asynchonous hardware. In Proceedings of the 27th Annual Symposium on Foundations of Computer Science. IEEE, New York.Google ScholarGoogle Scholar

Index Terms

  1. Fault-tolerant wait-free shared objects

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image Journal of the ACM
                  Journal of the ACM  Volume 45, Issue 3
                  May 1998
                  177 pages
                  ISSN:0004-5411
                  EISSN:1557-735X
                  DOI:10.1145/278298
                  • Editor:
                  • F. T. Leighton
                  Issue’s Table of Contents

                  Copyright © 1998 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 1 May 1998
                  Published in jacm Volume 45, Issue 3

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • article

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader