Advanced self-healing for cloud applications
When running distributed systems, especially in the cloud, it is inevitable that network partitions will occur, which are commonly referred to as split brain syndrome.
Split brain syndrome, in a clustering context, is a state in which a cluster of nodes gets divided (or partitioned) into smaller clusters, each of which believes it is the only active cluster. Believing the other clusters are dead, each cluster may simultaneously access the same application data or disks, which can lead to data corruption.
Production Suite enhances Akka Cluster resilience and prevents data loss with predefined resolution strategies for recovering unreachable nodes during network partitions.
By automatically applying preconfigured resolution strategies, recovering failed nodes no longer requires manual intervention by operations staff, often on a 24-hour watch to ensure resiliency in mission-critical applications.
Apply the best strategy
Because there is no “one size fits all” solution to this challenge, multiple strategies are offered to best fit the characteristics of the system: Static Quorum, Keep Majority, Keep Oldest, and Keep Referee.
This strategy is a good choice when there are a fixed number of nodes in the cluster, or when a fixed number of nodes with a certain role can be defined.
This strategy is a good choice when the number of nodes in the cluster change dynamically and therefore Static Quorum cannot be used.
This strategy is good to use with Cluster Singleton. If the oldest node crashes a new singleton instance will be started on the next oldest node.
This strategy is good if when one node hosts a critical resource that the system cannot run without.