More troubleshooting information
The HA guide is useful for troubleshooting HA clusters. The following are links to sections with more information.
- If sessions are lost after a failover, you may need to change
route-ttl
to keep synchronized routes active longer. See Synchronizing kernel routing tables. - In rare cases, sometimes after a cluster unit has been replaced, a cluster might not form because the disk partition sizes of the cluster units are different. Use the following command to check the disk storage checksum of each cluster unit. If the checksums are different, then contact Fortinet support for help in setting up compatible storage partitions.
diagnose sys ha showcsum 1 system | grep storage
- To control which cluster unit becomes the primary unit, you can change the device priority and enable override. See Controlling primary unit selection using device priority and override.
- Changes made to a cluster can be lost if override is enabled. See Configuration changes can be lost if override is enabled.
- When override is enabled, after a failover, traffic might be disrupted if the primary unit rejoins the cluster before the session tables are synchronized or for other reasons such as if the primary unit is configured for DHCP or PPPoE. See Delaying how quickly the primary unit rejoins the cluster when override is enabled.
- In some cases, age differences among cluster units can result in the wrong cluster unit becoming the primary unit. For example, if a cluster unit set to a high priority reboots, that unit will have a lower age than other cluster units. You can resolve this problem by resetting the age of one or more cluster units. See Primary unit selection and age. You can also adjust how sensitive the cluster is to age differences. This can be useful if large age differences cause problems. See Cluster age difference margin (grace period) and Changing the cluster age difference margin.
- If one cluster unit needs to be serviced or removed from the cluster, you can do so without affecting the operation of the cluster. See Disconnecting a cluster unit from a cluster.
- If FGSP is enabled, the web-based manager and CLI do not allow you to configure HA. See FortiGate Session Life Support Protocol (FGSP).
- If one or more FortiGate unit interfaces is configured as a PPTP or L2TP client, the GUI and CLI do not allow you to configure HA.
- FGCP is compatible with DHCP and PPPoE but be careful when configuring a cluster that includes a FortiGate interface configured to get its IP address with DHCP or PPPoE. Fortinet recommends turning on DHCP or PPPoE addressing for an interface after the cluster has been configured. See FortiGate HA compatibility with DHCP and PPPoE.
- Some third-party network equipment may prevent HA heartbeat communication resulting in the failure of the cluster or the creation of a split brain scenario. For example, some switches use packets with the same Ethertype as HA heartbeat packets used for internal functions and when used for HA heartbeat communication, the switch generates CRC errors and the packets are not forwarded. See Heartbeat packet Ethertypes.
- Very busy clusters might not be able to send HA heartbeat packets quickly enough resulting in a split brain scenario. You might be able to resolve this problem by modifying HA heartbeat timing. See Modifying heartbeat timing.
- Very busy clusters might have performance degradation if session pickup is enabled. If possible, you can disable this feature to improve performance. If you require session pickup for your cluster, there are options for improving session pickup performance. See Improving session synchronization performance.
- If it takes longer than expected for a cluster to fail over, try changing how the primary unit sends gratuitous ARP packets. See Changing how the primary unit sends gratuitous ARP packets after a failover on page 1.
- You can improve failover times by configuring the cluster for subsecond failover. See Subsecond failover and Failover performance.
- When you first put a FortiGate unit in HA mode, you might lose connectivity to the unit because HA changes the MAC addresses of all FortiGate unit interfaces including the one that you are connecting to. The cluster MAC addresses also change if you change some HA settings such as the cluster group ID. The connection will be restored in a moment as your network and PC updates to the new MAC address. To reconnect more quickly, you can update the ARP table of your management PC by deleting the ARP table entry for the FortiGate unit (or just deleting all ARP table entries). You might be able to delete the ARP table of your management PC using a command similar to
arp -d
. - Since HA changes all cluster unit MAC addresses, if your network uses MAC address filtering, you might have to make configuration changes to account for the HA MAC addresses.
- A network might experience packet loss when two FortiGate HA clusters have been deployed in the same broadcast domain. Deploying two HA clusters in the same broadcast domain can result in packet loss because of MAC address conflicts. Diagnose packet loss by pinging from one cluster to the other or by pinging both of the clusters from a device in the broadcast domain. You can resolve the MAC address conflict by changing the HA Group ID configuration of the two clusters. The HA Group ID is sometimes called the Cluster ID. See Diagnosing packet loss with two FortiGate HA clusters in the same broadcast domain.
- If there is a synchronization problem between the primary unit and one or more subordinate units, the cluster CLI displays
slave is not in sync
messages. See How to diagnose HA out of sync messages. - If you have configured dynamic routing and the new primary unit takes too long to update its routing table after a failover, you can configure a graceful restart and also optimize how routing updates are synchronized. See Configuring graceful restart for dynamic routing failover and Synchronizing kernel routing tables.
- Some switches might not be able to detect that the primary unit has become a backup unit and will keep sending packets to the former primary unit. This can occur after a link failover if the switch does not detect the failure and does not clear its MAC forwarding table. See Updating MAC forwarding tables when a link failover occurs.
- If a link not directly connected to a cluster unit fails, such as between a switch connected to a cluster interface and the network, you can enable remote link failover to maintain communication. See Remote link failover.
- If you find that some cluster units are not running the same firmware build, reinstall the correct firmware build on the cluster to upgrade all cluster units to the same firmware build. See Synchronizing the firmware build running on a new cluster unit.