What happens during a failover
The master node and primary slave nodes send heartbeats to each other to detect if its peers are alive. If the master node is not accessible, such as during a reboot, a failover occurs. You can also configure a ping server to regularly check the unit's network condition and downgrade itself to primary slave type to trigger a failover. In a failover, the primary slave and master switch roles and the cluster IP addresses change, as indicated by the boxes in the lower image.
The failover logic handles two different scenarios:
Objective node available
The objective node is a slave (either Primary or Regular) that can decide the new master. For example, if a cluster consists of one master node, one primary slave node, and one regular slave node, the regular slave node is the objective node.
After a primary slave node takes over the master role, the original master node will accept the decision when it is back online.
After the original master is back online, it will become a primary slave node.
No Objective node available
When there is no objective node in the cluster, the cluster topography is not stable and the failover process may take several rounds of role changes. This occurs when there is no communication between nodes because the cluster's internal communication is down . During the failover process, the final roles of master and primary slave are decided by three principal factors: the internal connections, the health check and the serial number.
The internal connections in a cluster involve two ports: port1 and the cluster internal port, typically port2 depending on your configuration.
Port1 is used when a node prompts itself to be the master and needs confirmation from other nodes.
The cluster internal port is used for cluster nodes to detect whether its connection to other nodes in the cluster is available or not, and is used to ask the primary slave to failover when its health check fails.
The health check is used to check the connection with the ping server. If this connection fails in the master node, it triggers a failover.
Once the port1 connection is recovered, the unit with the newer serial number will keep the master role and the unit with the older serial number will become the primary slave.
When the new master is decided, it will:
- Build up the scan environment.
- Apply all the settings synchronized from the original master except the port3 IP and the internal communication port IP of the original master.
After a failover occurs, the original master might become a primary slave node.
It keeps its original port3 IP and internal cluster communication IP. All other interface ports are shut down as it becomes a slave node. Some functionality is turned off such as email alerts. If you want to reconfigure settings, such as the interface IP, you must do that through the CLI command or the master's Central Management page.
Do not change the new master configuration before the old master has returned online, because there is a risk the configuration could be lost. If It is absolutely necessary to reconfigure the new master, it is recommended to first remove the old master from the cluster using the CLI command
As the new master takes over the port that client devices communicate with will switch to it. As the new master needs time to start up all the services, clients may experience a temporary service interruption.