Unreliable failure detectors for reliable distributed systems

Authors:Tushar Chandra, S. Toueg

Summary:Abstract: "It is well-known that Consensus, a fundamental problem of fault-tolerant distributed computing, cannot be solved in asynchronous systems with crash failures. This impossibility result stems from the lack of reliable failure detection in such systems. To circumvent such impossibility results, we introduce the concept of unreliable failure detectors that can make mistakes, and study the problem of using them to solve Consensus. We characterize unreliable failure detectors by two types of properties: completeness and accuracy. Informally, completeness requires that the failure detector eventually suspects every process that actually crashes, while accuracy restricts the mistakes that it can make. We define a hierarchy of failure detectors based on the strength of their accuracy

Print Book, English, [1993]

Publisher: Cornell University, Dept. of Computer Science, Ithaca, N.Y., [1993]