Bully Algorithm is a type of monarchical leader election.
- The system is synchronous.
- Processes may fail at any time, including during execution of the algorithm.
- There is a failure detector which detects failed processes.
- A process fails by stopping and returns from failure by restarting.
- Message delivery between processes is reliable.
- Each process knows its own process id and address, and that of every other process.
The algorithm uses the following message types:
- Election Message: Sent to announce election
- Answer (Alive) Message: Responds to the Election message
- Coordinator (Victory) Message: Sent by winner of the election to announce victory
When a process P recovers from failure, or the failure detector indicates that the current coordinator has failed, P performs the following actions:
- If P has the highest process id, it sends a Victory message to all other processes and becomes the new Coordinator. Otherwise, P broadcasts an Election message to all other processes with higher process IDs than itself.
- If P receives no Answer after sending an Election message, then it broadcasts a Victory message to all other processes and becomes the Coordinator.
- If P receives an Answer from a process with a higher ID, it sends no further messages for this election and waits for a Victory message. (If there is no Victory message after a period of time, it restarts the process at the beginning.)
- If P receives an Election message from another process with a lower ID it sends an Answer message back and starts the election process at the beginning, by sending an Election message to higher-numbered processes.
- If P receives a Coordinator message, it treats the sender as the coordinator.
- Let process ID be a Type 4 UUID. Note that UUIDs will need to be comparable.
- As an invariant, the process ID does not change during the lifetime of a process.
- The lifetime of a process will be denoted by an epoch stamped on all the members in a cluster.
- From leader election to leader death, the cluster's epoch does not change. A new round bumps the cluster's epoch.
- For improved reliability (a stretch goal), metadata of cluster members, leadership, epochs are/can be saved in persistent storage.
- Satisfying assumption #1: TCP is used as the transport.
- Satisfying assumption #2,6: addition/removal of cluster members will be broadcasted to all processes participating in leader election.
- Satisfying assumption #3: a heartbeat-based failure detector is provided.
- Satisfying assumption #4: processes obey fail-stop maxim.
- Satisfying assumption #5: Blocking I/O is used in conjunction with TCP as the transport.