A single point of failure (SPOF) is a part of a system that if it fails, will stop the entire system from working. A high available or reliable system cannot have a SPOF. We can remove SPOFs by the employing the following techniques:

  • Introduce redundancy: Add a secondary resource as a failover when the primary resources fails. The failover typically requires some time before it completes and during this time period the resource remains unavailable. Use standby redundancy for stateful components. In active redundancy, requests are distributed to multiple nodes. In case, a node fails then the workload is distributed amongst the healthy nodes.

  • Detect Failure: Design good health checks for your backend nodes.

  • Durable Data Storage: Synchronous replication only acknowledges a transaction after it has been durably stored in both the primary location and its replicas. It is ideal for protecting the integrity of the data from the event of a failure of the primary node. In asynchronous replication, changes on primary node are not immediately reflected on the replicas. Which means it is best suited for horizontal scaling.

Reference: AWS Whitepapers & Guides