A distributed system can be described as connected nodes lacking access to global memory.[1] Nodes of a distributed system maintain local variables with contents specifying the state of that node. Local states of the system are characterized in terms of a given node being able to read its own state and the state of a proximally connected - neighbor - node. The global state of the system is the union of all local states of all nodes of the system.
Disconnect between local and global information makes distributed systems susceptible to certain types of failures. For our purposes, faults are disruptions in expected behavior of a distributed system, which may be classified as e.g. permanent, intermittent, or transient. Following Dijkstra, we will focus on transient faults, faults that occur once then disappear but do not affect the global behavior of the system, e.g. message transmission timeout followed by transmission success. In particular, we will examine the notion self-stabilization for a system S with respect to some property P - defined over the global states of S – which identifies the correct behavior of S. Roughly, a system exhibiting self-stabilization can sustain arbitrary transient faults and return to a desirable system state without external intervention. Self-stabilization, in this respect, is much like a spinning top inscribing small concentric circles on a table. Lightly pressing the top – a transient fault – results in a self-stabilizing top inscribing larger concentric circles, before returning to its behavior of inscribing smaller concentric circles. Similarly, a distributed system exhibiting self-stabilization can weather transient faults without, say, crashing.
Broadly speaking, there are two types of global state for a given distributed system with respect to property P, called legitimate states – those which satisfy P - and illegitimate states – those which do not satisfy P. According to Dijkstra, legitimate states must be such that:[2]
(CLOSURE) Every transition from a legitimate state results only in a legitimate state[3]
(PAIR) Every pair of legitimate states can be connected by transitions from one to the other[4]
From which it is a corollary that each global state in sequence of transitions between legitimate states is itself legitimate.[5] Moreover, legitimate states must be such that:
(LEGIT) Every legitimate state has at least one privilege present
(PRIV) Each privilege must be present in at least one legitimate state
Where the privilege is a Boolean function defined in terms of a given node’s state and that of its neighbors. If values of a node and its neighbors are input to this Boolean function, then if the function returns the value “true”, then the privilege is said to be present, and “false” otherwise. If (LEGIT) is true with respect to some distributed system, then in any legitimate state of that system there is at least one node for which the privilege function returns “true.” If (PRIV) is true, then every possible output for the privilege function which returns “true” appears in at least one legitimate state.
Definitions in hand, Dijkstra defined a distributed system S as self-stabilizing with respect to P, on the condition that S satisfies the following two behaviors:
(ONE) The system has at most one privilege present
(CONVERGE) Given an arbitrary global state, the global state of S satisfies P within a finite number of state transitions
If (ONE) is true, a self-stabilizing system must have at most one set of state machine and neighbor inputs that result in “true.” A corollary of (LEGIT) and (PRIV), is that a self-stabilizing distributed system with respect to behavior P has exactly one privilege present in any given legitimate state and all privileges will eventually be represented. If (CONVERGE) is true, then the relevant distributed system satisfies P in finite transitions. Combined with (CLOSURE) and (PAIR), any given pair of legitimate states can be connected by a finite sequence of legitimate state transitions. Altogether, a self-stabilizing distributed system will satisfy P in a finite number of transitions, have one privilege, each privilege will be present in some legitimate state which will never transition to an illegitimate state, and each state in between any pair of legitimate states will involve only legitimate states. Another assumption worth mentioning here – if just to foreshadow - is that Dijkstra posited a central demon for self-stabilizing distributed systems with the ability to select a privileged node to enact a transition. The selection is arbitrary, and once a privileged node enacts a transition, the central demon arbitrarily chooses another among the available privileged nodes to enact the next transition. No node is permitted to enact a transition unless it has been selected by the central demon. Subtleties associated with the central demon assumption will occupy us later.
Before turning to that discussion, however, it is worth specifying Dijkstra’s self-stabilization definition in detail by considering one of three[6] characterizations Dijkstra provided of a self-stabilizing distributed system. Suppose there are N+1 nodes, numbered 0-N in a ring. For any given machine nr.i, where i is an integer between 0 and N, we adopt the following notation:
L: state of the left neighbor of nr.i, i.e. nr.(i-1)mod(N+1)
S: state of machine nr.i
R: state of right neighbor of nr.i, i.e. nr.(i+1)mod(N+1)
For illustration, suppose we have N=3, so there are N+1=4 machines, numbered 0-3 in a ring. Consider nr.0. We have the following:
L: state of the left neighbor of nr.0, i.e. nr.(-1)mod(4)=nr.3
S: state of machine nr.0
R: state of right neighbor of nr.0, i.e. nr.(1)mod(3)=1
To provide conditions under which such a distributed system self-stabilizes, Dijkstra distinguished exceptional machines from all others.[7] In particular, the conditions are, where K is the number of potential assignments to nodes, N the number of nodes, and N ≤ K:
Exceptional Node: If L=S, then assign (S+1) mod K to S
Other Node: If L≠S, then assign L to S
Without loss, suppose nr.0 is the exceptional machine. If there are N nodes each randomly assigned values in K initially, then any node - but the exceptional node - which has distinct assignments from their left neighbor will be privileged. The central demon will arbitrarily select one of these nodes for transition. Suppose nr.3 is chosen. Then nr.3 must have a state distinct from nr.2. But then nr.3 changes its state to that of nr.2, and so nr.3 loses privilege. If, moreover, nr.4 is given privilege, then it will change its own state to be that of nr.3. The procession should be clear. Ultimately, each node that is not exceptional will be in the same state. When this happens, only nr.0 will be privileged, at which point it will initiate a sequence of transitions moving the privilege around the ring.
Recall how Dijkstra’s central demon operates.[8] The central demon selects a privileged node to enact a transition. The selection is arbitrary, and once a privileged node enacts a transition, the central demon arbitrarily chooses another among the available privileged nodes to enact the next transition. No node is permitted to enact a transition unless it has been selected by the central demon. We can divide four distinct assumptions:
(UNIQUE) A distributed system has only one central demon
(SOLITARY) A central demon cannot choose more than one privileged node
(RANDOM) A central demon chooses among privileged nodes arbitrarily
(LOSS) A central demon cannot grant/remove privilege from a node without that node enacting a transition
We might wonder whether rejecting one or more of these assumptions undermines self-stabilization for a given distributed system. Suppose, for example, we reject (UNIQUE) while maintaining the other assumptions. Let there be two central demons. Suppose each demon arbitrarily and independently chooses privileged nodes for transition. If each central demon happens to choose the same node, the transition will proceed as if there was only one central demon. So suppose the central demons choose distinct privileged nodes. Then there will need to be some priority ranking which allows the system to decide which of the selected privileged nodes will make the transition first, since both are permitted to make system transitions. We might add a third central demon that arbitrarily selects a privileged node from the two nodes selected by the first and the second central demon, resulting in the selected node making the first transition. Under these conditions, it seems self-stabilization of a distributed system is not violated, since rejecting (UNIQUE) can be plausibly described – if three central demons are permitted – as a two-step implementation of the central demon operation.[15] Specifically, the setup is isomorphic to having a single central demon that selects two distinct nodes, then arbitrarily selects one pair to go first. More generally, this setup would – with n+1 central demons n of which select distinct privileged nodes and one of which arbitrarily orders the results – amount to a single central demon that selects n privileged nodes then orders them arbitrarily. In other words, a legitimate state will ultimately be reached, and will be closed under legitimate states through further transitions given the other central demon assumptions.
If we reject (SOLITARY) and maintain the rest, we seem to run into trouble since there is no obvious way for multiple privileged nodes selected to transition to determine which nodes will go first. Since we are assuming (UNIQUE), we cannot resolve this issue by appealing to more central demons. But we might adjust the capabilities of the central demon in line with rejecting (SOLITARY) by allowing the demon that arbitrarily chooses several nodes - say, nr.1, nr.2, and nr.3 – may then select among those nodes a priority ranking indicating which node transitions first, second, etc. As with (UNIQUE), since the central demon’s selection was initially among privileged nodes, and the transition of each node results in a loss of privilege for that node and potential gain of privilege for another, then transitioning in groups of ordered privileged nodes will eventually reach a legitimate state if transitioning through one node at a time. Moreover, once in a legitimate state, under these conditions the distributed system will remain in a legitimate state through transitions. Hence, again it seems rejecting one of Dijkstra’s implicit assumptions with respect to the central demon, namely, (SOLITARY) does not affect self-stabilization.
If we reject (RANDOM), then the central demon may exhibit a pattern of privilege selection, say, in order of nearest to the exceptional node, say, nr.0, in terms of counterclockwise rotation. For example, if the privileged states include nr.2, nr.7, and nr.8, then the central demon will select nr.2. On this ordering, the central demon’s selection being non-arbitrary will again not have an effect on whether the relevant distributed system will ultimately reach a legitimate state for reasons comparable to those involving the discussion of (UNIQUE) and (SOLITARY). Moreover, once in a legitimate state, the distributed system will not be led into an illegitimate state, since there will be only one privileged node to select, and so only one pattern for the central demon to employ.
The preceding discussion was warmup; if we reject (LOSS), then the central demon may arbitrarily alter the state of a node without enacting a transition, as well as arbitrarily choose a privileged state to enact a transition. There are broadly two ways in which this might affect self-stabilization. First, the central demon might remove privilege from a node; second, the central demon might grant privilege to a node. We consider each in turn.
Suppose the central demon is permitted to remove privilege from a node. Suppose nr.0 is the exceptional privileged state. Then f(nr.0)=f(nr.N). Suppose the central demon removes privilege from nr.0. Then f(nr.0)≠f(nr.N), but also f(nr.0)≠f(nr.1), and so nr.1 will be privileged. In this case, if the relevant distributed system is in a legitimate state, then it remains in a legitimate state despite the central demon’s action. Similar remarks apply to non-exceptional nodes with privilege. On the other hand, suppose there are two privileged nodes, f(nr.1)≠f(nr.2) and f(nr.3)≠f(nr.4), while f(nr.0)≠f(nr.N), f(nr.4)=f(nr.5), f(nr.5)=f(nr.N), and f(nr.2)=f(nr.3). Suppose the central demon removes privilege from nr.1, so that f(nr.1)=f(nr.2)=f(nr.3). Again, the actions of the central demon in this context will neither undermine reaching a legitimate state nor force a transition from a legitimate state to an illegitimate state. These observations suggests rejecting (LOSS) by allowing the central demon to arbitrarily remove privilege from a node does not undermine self-stabilization.
However, interpreting (LOSS) as allowing the central demon to arbitrarily grant privilege to a node does undermine self-stabilization. To illustrate, suppose there are two privileged nodes, f(nr.1)≠f(nr.2) and f(nr.3)≠f(nr.4), while f(nr.0)≠f(nr.N), f(nr.4)=f(nr.5), f(nr.5)=f(nr.N), and f(nr.2)=f(nr.3). Suppose the central demon’s actions repeat in the following manner: (1) grant privilege to nr.j for some j in N where j≠0, i.e. not exceptional; (2) reverse the transition of nr.j by granting nr.j privilege, e.g. altering the assignment of nr.j. If the central demon operates according to such a pattern – which is reasonable sequence given our rejection of (LOSS) and maintenance of the other assumptions – then a distributed system in an illegitimate state will not necessarily transition to a legitimate state. Moreover, rejecting (LOSS) in this manner makes trouble for maintaining closure over legitimate states as well. To illustrate, suppose a distributed system is in a legitimate state, e.g. f(nr.0)=f(nr.2)=f(nr.1) for N=2 and nr.0 the exceptional node. The central demon might grant nr.0 privilege, resulting in a transition to f(nr.0)≠f(nr.2)=f(nr.1) and nr.1 being privileged, but then alter nr.2 so that f(nr.0)=f(nr.2)≠f(nr.1), resulting in each of nr.0, nr.1, and nr.2 being privileged, i.e. an illegitimate state. This clearly violates (CLOSURE).
Our focus here Dijkstra’s commitment to a central demon. We identified features of this commitment, distinguishing several of which seem incidental to the maintenance of a self-stabilizing system. However, we did identify one crucial assumption, namely, that the central demon not be able to introduce privilege to the ring, which if rejected would undermine self-stabilization. This seems of particular importance when thinking about the security of networks and fault tolerance, since it seems plausible allowing a central demon to grant single privilege to a system might simulate a type of subtle attack on an otherwise self-stabilizing system, which might go unnoticed due to its maintenance what we identified as incidental central demon assumptions.
[1] Kshemkalyani, A.D. & Singhal, M. (2007). Distributed Computing: Principles, Algorithms, and Systems.
[2] Dijkstra, E.W. (1974). Self-Stabilizing Systems in Spite of Distributed Control. Communications of the ACM. 17(11): 643-644.
[3]More carefully: For any legitimate state s at transition point t, and state s’ at transition point t’, if t<t’, then s’ is a legitimate state.
[4]More carefully: For any legitimate states s, s’’ at respective transition points t, t’’, there is a state s’ at transition point t’ such that t<t’<t’’.
[5]More carefully: For any legitimate states s, s’’ at respective transition points t, t’’, if state s’ at t’ is such that t<t’<t’’, then s’ is a legitimate state. Proof: Suppose state s’ at t’ is such that t<t’<t’’. There is such an s’ by (PAIR) and s’ is a legitimate state by (CLOSURE).
[6] Dijkstra, E.W. (1986). A Belated Proof of Self-Stabilization. Distributed Computing. 1:5-6.
[7]This is an assumption worth exploring in more detail, but I focus on other aspects of Dijkstra’s algorithm in what follows. Hence, my not giving this assumption its own name here.
[8]Explanation of the central demon follows (Dijkstra, 1986) closely, supplemented by remarks in (Kshemkalyani & Singhal, 2007). I did not find any similarly detailed analysis of the various aspects of the central demon assumption.
[9]Indeed, since the selection of privileged nodes is arbitrary with Dijkstra’s assumptions, it seems the only difference that arises by rejecting (UNIQUE) is that some two-step (or n-step where n is less than or equal to the number of privileged states when the selection is made) will result in longer transition sequences leading to a legitimate state than others. Nevertheless, central demons selecting groups of privileged nodes to operate in what we might think of as ‘transition chunks’ will ultimately lead to a legitimate state as far as I can see.