article

Free Access

Fault-tolerant wait-free shared objects

Authors:
Prasad Jayanti

Dartmouth College, Honover, NH

Dartmouth College, Honover, NH
View Profile

,
Tushar Deepak Chandra

IBM T. J. Watson Research Center, Hawthorne, NY

IBM T. J. Watson Research Center, Hawthorne, NY
View Profile

,
Sam Toueg

Cornell Univ., Ithaca, NY

Cornell Univ., Ithaca, NY
View Profile

Authors Info & Claims

Journal of the ACM Volume 45 Issue 3pp 451–500https://doi.org/10.1145/278298.278305

Published:01 May 1998Publication History

Journal of the ACM

Abstract

Wait-free implementations of shared objects tolerate the failure of processes, but not the failure of base objects from which they are implemented. We consider the problem of implementing shared objects that tolerate the failure of both processes and base objects.

We identify two classes of object failures: responsive and nonresponsive. With responsive failures, a faulty object responds to every operation, but its responses may be incorrect. With nonresponsive failures, a faulty object may also “hang” without responding. In each class, we define crash, omission, and arbitrary modes of failure.

We show that all responsive failure modes can be tolerated. More precisely, for all responsive failure modes ℱ, object types T, and t ≥ 0, we show how to implement a shared object of type T which is t-tolerant for ℱ. Such an object remains correct and wait-free even if up to t base objects fail according to ℱ. In contrast to responsive failures, we show that even the most benign non-responsive failure mode cannot be tolerated. We also show that randomization can be used to circumvent this impossibility result.

Graceful degradation is a desirable property of fault-tolerant implementations: the implemented object never fails more severely than the base objects it is derived from, even if all the base objects fail. For several failure modes, we show wheter this property can be achieved, and, if so, how.

References

AFEK, Y., GREENBERG, D., MERRITT, M., AND TAUBENFELD, G. 1992. Computing with faulty shared memory. In Proceedings of the 11th Annual ACM Symposium on Principles of Distributed Computing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 47-58. Google Scholar
AFEK, Y., GREENBERG, D., MERRITT, M., AND TAUBENFELD, G. 1995. Computing with faulty shared objects. J. ACM 42, 6 (Nov.), 1231-1274. Google ScholarDigital Library
ASPNES, J. 1990. Time- and space-efficient randomized consensus. In Proceedings of the 9th Annual ACM Symposium on Principles of Distributed Computing (Quebec City, Que., Canada, Aug. 22-24). ACM, New York, pp. 325-332. Google Scholar
BERMAN, P., GARAY, J., AND PERRY, K. J. 1989. Towards optimal distributed consensus. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science. IEEE, New York, pp. 410-415.Google Scholar
BLOOM, B. 1987. Constructing two-writer atomic registers. In Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 249-259. Google Scholar
BURNS, J. E., AND PETERSON, G.L. 1987. Constructing multi-reader atomic values from nonatomic values. In Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 222-231. Google Scholar
COAN, B.A. 1987. Achieving consensus in fault-tolerant distributed computer systems: Protocols, lower bounds, and simulations. Ph.D dissertation. Massachusetts Institute of Technology, Cambridge, Mass., June. Google Scholar
COAN, g. A., AND WELCH, J.L. 1992. Modular construction of a Byzantine agreement protocol with optimal message bit complexity. Inf. Comput. 97, 1, 61-85. Google ScholarCross Ref
COURTOIS, P. J., HEYMANS, R., AND PARNAS, D.L. 1971. Concurrent control with "readers" and "writers." Commun. ACM 14, 10 (Oct.), 667-668. Google ScholarDigital Library
DOLEV, D., DWORK, C., AND STOCKMEYER, L. 1987.On the minimal synchronism needed for distributed consensus. J. ACM 34, 1 (Jan.), 77-97. Google ScholarDigital Library
DOLEV, D., REISCHUK, R., AND STRONG, H.R. 1990. Early stopping in Byzantine agreement. J. ACM 37, 4 (Oct.), 720-741. Google ScholarDigital Library
FISCHER, M., LYNCH, N., AND MERRITT, M. 1986. Easy impossibility proofs for distributed consensus problems. Distr. Comput. 1, 26-39. Google ScholarDigital Library
FISCHER, M. J., LYNCH, N. A., AND PATERSON, M.S. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (Apr.), 374-382. Google ScholarDigital Library
HALDAR, S., AND VIDYASANKAR, K. 1991. A simple construction of 1-writer multi-reader multivalued atomic variable from regular variables. Tech Rep. No. 9108. Department of Computer Science, Memorial University of Newfoundland, St. John's, NF, Canada.Google Scholar
HERLIHY, M. P. 1988. Impossibility and universality results for wait-free synchonization. In Proceedings of the 7th Annual ACM Symposium on Principles of Distributed Computing (Toronto, Ont., Canada, Aug. 15-17). ACM, New York, pp. 276-290. Google Scholar
HERLIHY, M. P. 1991a. Randomized wait-free concurrent objects. In Proceedings of the lOth Annual ACM Symposium on Principles of Distributed Computing (Montreal, Que., Canada, Aug. 19-21). ACM, New York, pp. 11-21. Google Scholar
HERLIHY, M. P. 1991b. Wait-free synchronization. ACM Trans. Prog. Lang. Syst. 13, 1 (Jan.), 124-149. Google ScholarDigital Library
HERLIHY, M. P., AND WING, J.M. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Prog. Lang. Syst. 12, 3 (July), 463-492. Google ScholarDigital Library
JAYANTI, P., CHANDRA, T. D., AND TOUEG, S. 1996. Fault-tolerant wait-free shared objects. Tech. Rep. TR 96-1565. Department of Computer Science, Cornell Univ., Ithaca, New York, January. Google Scholar
JAYANTI, P., AND TOUEG, S. 1992. Some results on the impossibility, universality, and decidability of consensus. In Proceedings of the 6th Workshop on Distributed Algorithms (Haifa, Israel, Nov.). Lecture Notes in Computer Science, vol. 647. Springer-Verlag, New York. Google Scholar
KLEINBERG, J., AND MULLAINATHAN, S. 1993. Resource bounds and combinations of consensus objects. In Proceedings of the 12th Annual ACM Symposium on Principles of Distributed Computing (Ithaca, N.Y., Aug. 15-18). ACM, New York, pp. 133-144. Google Scholar
LAMPORT, L. 1977. Concurrent reading and writing. Commun. ACM 20, 11 (Nov.), 806-811. Google ScholarDigital Library
LAMPORT, L. 1978. The implementation of reliable distributed multi-process systems. Comput. Netw. 2, 95-114.Google Scholar
LAMPORT, L. 1986.On interprocess communication, Parts I and II. Distr. Comput. 1, 77-101.Google ScholarCross Ref
LAMPORT, L., SHOSTAK, R., AND PEASE, M. 1982. The Byzantine generals problem. ACM Trans. Prog. Lang. Syst. 4, 3 (July), 382-401. Google ScholarDigital Library
LouI, M. C., AND ABU-AMARA. 1987. Memory requirements for agreement among unreliable asynchonous processes. Adv. Comput. Res. 4, 163-183.Google Scholar
LYNCH, N., AND TUTTLE, M. 1988. An introduction to input/output automata. Tech. Rep. MIT/ LCS/TM-373. Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.Google Scholar
NEWMAN-WOLFE, R. 1987. A protocol for wait-free, atomic, multi-reader shared variables. In Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 232-248. Google Scholar
PEASE, M., SHOSTAK, R., AND LAMPORT, L. 1980. Reaching agreement in the presence of faults. J. ACM 27, 2 (Apr.), 228-234. Google ScholarDigital Library
PETERSON, G.L. 1983. A new solution to Lamport's concurrent programming problem using small shared variables. ACM Trans. Prog. Lang. Syst. 5, 1 (Jan.), 56-65. Google ScholarDigital Library
PETERSON, G., AND BURNS, J. 1987. Concurrent reading while writing II: The multi-writer case. In Proceedings of the 28th Annual Symposium on Foundations of Computer Science. IEEE, New York.Google Scholar
PLOTKIN, S.A. 1989. Sticky bits and universality of consensus. In Proceedings of the 8th Annual ACM Symposium on Principles of Distributed Computing (Edmonton, Alb., Canada, Aug. 14-16). ACM, New York, pp. 159-175. Google Scholar
SCHAFFER, R. 1988. On the correctness of atomic multi-writer registers. Tech. Rep. TR No.: MIT/LCS/TM-364. Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.Google Scholar
SCHNEIDER, F.B. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22, 4 (Dec.), 299-319. Google ScholarDigital Library
SINGH, A. K., ANDERSON, J. H., AND GOUDA, M.G. 1987. The elusive atomic register, revisited. In Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 206-221. Google Scholar
SRIKANTH, T. K., AND TOUEG, S. 1987. Simulating authenticated broadcasts to derive simple fault-tolerant algorithms. Dist. Comput. 2, 2, 80-94.Google ScholarDigital Library
VIDYASANKAR, K. 1988. Converting Lamport's regular register to atomic register. IPL 28, 287-290. Google ScholarDigital Library
VIDYASANKAR, K. 1989. An elegant 1-writer multireader multivalued atomic register. IPL 30, 221-223. Google ScholarDigital Library
VITANYI, P., AND AWERBUCH, B. 1986. Atomic shared register access by asynchonous hardware. In Proceedings of the 27th Annual Symposium on Foundations of Computer Science. IEEE, New York.Google Scholar

Index Terms

Recommendations

Fault-tolerant Wait-free Shared Objects
Read More
Fault-tolerant wait-free shared objects
SFCS '92: Proceedings of the 33rd Annual Symposium on Foundations of Computer Science

The authors classify object failures into two broad categories: responsive and non-responsive. They require that wait-free objects subject to responsive failures continue to respond (in finite time) to operation invocations. The responses may be ...
Read More
Robust wait-free hierarchies

The problem of implementing a shared object of one type from shared objects of other types has been extensively researched. Recent focus has mostly been on wait-free implementations, which permit every process to complete its operations on implemented ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Journal of the ACM Volume 45, Issue 3
May 1998
177 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/278298
Editor:
F. T. Leighton
Massachusetts Institute of Technology, Cambridge
Issue’s Table of Contents
Copyright © 1998 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 1998
Published in jacm Volume 45, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
MIMD
asynchronous computing
fault-tolerance
implementation
shared memory
shared objects
synchronization
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 71
  Total Citations
  View Citations
- 954
  Total Downloads
- Downloads (Last 12 months)41
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fault-tolerant wait-free shared objects

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Fault-tolerant Wait-free Shared Objects

Fault-tolerant wait-free shared objects

Robust wait-free hierarchies

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Fault-tolerant wait-free shared objects

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Fault-tolerant Wait-free Shared Objects

Fault-tolerant wait-free shared objects

Robust wait-free hierarchies

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media