Fault-tolerant wait-free implementations and robust wait-free hierarchies

January 1994

Author:
Prasad V. S. Jayanti
Cornell Univ., Ithaca, NY

Publisher:

Cornell University
PO Box 250, 124 Roberts Place Ithaca, NY
United States

Order Number:UMI Order No. GAX94-09563

Bibliometrics

Abstract

In shared-memory systems, complex (shared) objects, such as queues and stacks, are implemented in software from simple objects, such as registers and test & sets, which are often supported in hardware. Traditional implementations use lock-based techniques and are consequently not fault-tolerant: if any process crashes while holding the lock, the other processes are effectively prevented from accessing the implemented object. Wait-free implementations, which have been the focus of much recent research, were introduced to overcome this drawback. An implementation is wait-free if every access by a non-faulty process is guaranteed a response, regardless of whether the other processes are slow, fast, or have crashed. This thesis addresses the following two issues concerning wait-free implementations: (1) Shared objects with wait-free implementations tolerate the failure of processes, but not the failure of base (hardware) objects from which they are implemented. We consider the problem of implementing shared objects that tolerate the failure of both processes and base objects.

We identify two classes of object failures: responsive and non-responsive. With responsive failures, a faulty object responds to every operation, but its responses may be incorrect. With non-responsive failures, a faulty object may also "'hang" without responding. In each class, we define crash, omission, and arbitrary modes of failure.

We show that all responsive failure modes can be tolerated. More precisely, for all responsive failure modes ${\cal F}$, object types T, and $t \ge 0$, we show how to implement a shared object of type T which is t-tolerant for ${\cal F}$. Such an object remains correct and wait-free even if up to t base objects fail according to ${\cal F}$. In contrast to responsive failures, we show that even the most benign non-responsive failure mode cannot be tolerated. We also show that randomization can be used to circumvent this impossibility result.

(2) Some objects are stronger than others in their ability to support wait-free implementations. It is thus natural to ask whether objects can be placed in a hierarchy accordingly. We identify robustness as a desirable property of such a hierarchy. Roughly speaking, a hierarchy is robust if no object at a given level has a wait-free implementation using objects at lower levels.

In this thesis, we investigate whether the well-known hierarchy proposed by Herlihy is robust. We prove that, contrary to popular belief, this hierarchy is not robust. Thus, objects at a low level in Herlihy's hierarchy are not necessarily weak: they can be used to implement wait-free objects at higher levels. We therefore propose three natural variants of Herlihy's hierarchy. We prove that two of these are not robust. The robustness of the third is open.

Contributors

Prasad V S Jayanti
Dartmouth College
- Publication Years1990 - 2024
- Publication counts52
- Citation count614
- Available for Download25
- Downloads (cumulative)10,251
- Downloads (12 months)1,105
- Downloads (6 weeks)138
- Average Downloads per Article410
- Average Citation per Article12
View Full Profile

Index Terms

Fault-tolerant wait-free implementations and robust wait-free hierarchies
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
2. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software fault tolerance
      2. Software performance

Recommendations

Fault-tolerant wait-free shared objects
SFCS '92: Proceedings of the 33rd Annual Symposium on Foundations of Computer Science

The authors classify object failures into two broad categories: responsive and non-responsive. They require that wait-free objects subject to responsive failures continue to respond (in finite time) to operation invocations. The responses may be ...
Read More
Fault-tolerant Wait-free Shared Objects
Read More
Fault-tolerant wait-free shared objects

Wait-free implementations of shared objects tolerate the failure of processes, but not the failure of base objects from which they are implemented. We consider the problem of implementing shared objects that tolerate the failure of both processes and ...
Read More

Comments

Browse Theses

Sections

Index Terms

Fault-tolerant wait-free shared objects

Fault-tolerant Wait-free Shared Objects

Fault-tolerant wait-free shared objects

Sections

Save to Binder

Index Terms

Recommendations

Fault-tolerant wait-free shared objects

Fault-tolerant Wait-free Shared Objects

Fault-tolerant wait-free shared objects