18749-project

The flagship project involved the design and implementation of an industry-grade fault-tolerant distributed system, with heartbeats, distributed consensus, total ordering, checkpointing, and logging to provide strong consistency for a distributed replicated application. This project involved supporting different replication styles (active, or hot-swap replication, as well as passive, or primary-backup replication), along with mechanisms to ensure no downtime even as faults are injected.

Set-up Steps

Replication Manager (Machine-1)

Run command:

python replicate_manager.py

Global Fault Detector (Machine-1)

Run command:

python global_fault_detector.py

Clients (Machine-1)

Take care of line 8, rm_ip before running the code
Run on the same machine with RM and GFD
Multiple clients can be launched with different <client_id> in the run command.

Run command:

python client.py <client_id>

Local Fault Detector (Machine-2/Machine-3/Machine-4)

Change line 12, gfd_ip_address to the Machine-1 IP
Run on the same machine with its server

python local_fault_detector.py

Replica/Server (Machine-2/Machine-3/Machine-4)

Run command:

python server.py

Active Replication Testing Step:

Launch the RM
Launch the GFD
Launch LFD-1 and Server-1
Launch LFD-2 and Server-2
Launch LFD-3 and Server-3

------ End of Fault-free Testing ------

------ Start Fault Testing ------

Kill one of the server
Wait for some time
Bring back the dead server
Clients and the other two server should work normally and consistently during these steps and the membership changes should be broadcasted to all clients and existing servers

Passive Replication Testing Step:

Launch the RM
Launch the GFD
Launch LFD-1 and Server-1 (Primary)
Launch LFD-2 and Server-2 (back-up 1)
Launch LFD-3 and Server-3 (back-up 2)

------ End of Fault-free Testing ------

------ Start Fault Testing ------

Kill one of the back-up server
Wait for some time
Bring back the dead back-up server
Kill the primary server
Wait for some time
Bring back the dead primary server (it should become back-up now)

Clients and the other two server should work normally and consistently during these steps and the membership changes should be broadcasted to all clients and existing servers

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
active		active
passive		passive
.DS_Store		.DS_Store
README.md		README.md
diagram.png		diagram.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

18749-project

Set-up Steps

Replication Manager (Machine-1)

Global Fault Detector (Machine-1)

Clients (Machine-1)

Local Fault Detector (Machine-2/Machine-3/Machine-4)

Replica/Server (Machine-2/Machine-3/Machine-4)

Active Replication Testing Step:

Passive Replication Testing Step:

About

Releases

Packages

Contributors 2

Languages

weichennone/Replication-Fault-Tolerant-Project

Folders and files

Latest commit

History

Repository files navigation

18749-project

Set-up Steps

Replication Manager (Machine-1)

Global Fault Detector (Machine-1)

Clients (Machine-1)

Local Fault Detector (Machine-2/Machine-3/Machine-4)

Replica/Server (Machine-2/Machine-3/Machine-4)

Active Replication Testing Step:

Passive Replication Testing Step:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages