Fortinet black logo

Handbook

Failover performance

6.0.0
Copy Link
Copy Doc ID 4afb0436-a998-11e9-81a4-00505692583a:947938
Download PDF

Failover performance

This section describes the designed device and link failover times for a FortiGate cluster and also shows results of a failover performance test.

Device failover performance

By design FGCP device failover time is 2 seconds for a two-member cluster with ideal network and traffic conditions.

All cluster units regularly receive HA heartbeat packets from all other cluster units over the HA heartbeat link. If any cluster unit does not receive a heartbeat packet from any other cluster unit for 2 seconds, the cluster unit that has not sent heartbeat packets is considered to have failed.

The failover time can also be increased by more complex configurations and or configurations with network equipment that is slow to respond.

You can change the hb-lost-threshold to increase or decrease the device failover time. See Modifying heartbeat timing for information about using hb-lost-threshold, and other heartbeat timing settings.

Link failover performance

Link failover time is controlled by how long it takes for a cluster to synchronize the cluster link database. When a link failure occurs, the cluster unit that experienced the link failure uses HA heartbeat packets to broadcast the updated link database to all cluster units. When all cluster units have received the updated database the failover is complete.

It may take another few seconds for the cluster to negotiate and re-distribute communication sessions.

Reducing failover times

  • Keep the network configuration as simple as possible with as few as possible network connections to the cluster.
  • If possible operate the cluster in transparent mode.
  • Use high-performance switches to that the switches failover to interfaces connected to the new primary unit as quickly as possible.
  • Use accelerated FortiGate interfaces. In some cases accelerated interfaces will reduce failover times.
  • Make sure the FortiGate sends multiple gratuitous arp packets after a failover. In some cases, sending more gratuitous arp packets will cause connected network equipment to recognize the failover sooner. To send 10 gratuitous arp packets:

    config system ha

    set arps 10

    end

  • Reduce the time between gratuitous arp packets. This may also caused connected network equipment to recognize the failover sooner. To send 50 gratuitous arp packets with 1 second between each packet:

    config system ha

    set arps 50

    set arps-interval 1

    end

  • Reduce the number of lost heartbeat packets and reduce the heartbeat interval timers to be able to more quickly detect a device failure. To set the lost heartbeat threshold to 3 packets and the heartbeat interval to 100 milliseconds:

    config system ha

    set hb-interval 1

    set hb-lost-threshold 3

    end

  • Reduce the hello state hold down time to reduce the amount of the time the cluster waits before transitioning from the hello to the work state. To set the hello state hold down time to 5 seconds:

    config system ha

    set hello-holddown 5

    end

  • Enable sending a link failed signal after a link failover to make sure that attached network equipment responds a quickly as possible to a link failure. To enable the link failed signal:

    config system ha

    set link-failed-signal enable

    end

Failover performance

This section describes the designed device and link failover times for a FortiGate cluster and also shows results of a failover performance test.

Device failover performance

By design FGCP device failover time is 2 seconds for a two-member cluster with ideal network and traffic conditions.

All cluster units regularly receive HA heartbeat packets from all other cluster units over the HA heartbeat link. If any cluster unit does not receive a heartbeat packet from any other cluster unit for 2 seconds, the cluster unit that has not sent heartbeat packets is considered to have failed.

The failover time can also be increased by more complex configurations and or configurations with network equipment that is slow to respond.

You can change the hb-lost-threshold to increase or decrease the device failover time. See Modifying heartbeat timing for information about using hb-lost-threshold, and other heartbeat timing settings.

Link failover performance

Link failover time is controlled by how long it takes for a cluster to synchronize the cluster link database. When a link failure occurs, the cluster unit that experienced the link failure uses HA heartbeat packets to broadcast the updated link database to all cluster units. When all cluster units have received the updated database the failover is complete.

It may take another few seconds for the cluster to negotiate and re-distribute communication sessions.

Reducing failover times

  • Keep the network configuration as simple as possible with as few as possible network connections to the cluster.
  • If possible operate the cluster in transparent mode.
  • Use high-performance switches to that the switches failover to interfaces connected to the new primary unit as quickly as possible.
  • Use accelerated FortiGate interfaces. In some cases accelerated interfaces will reduce failover times.
  • Make sure the FortiGate sends multiple gratuitous arp packets after a failover. In some cases, sending more gratuitous arp packets will cause connected network equipment to recognize the failover sooner. To send 10 gratuitous arp packets:

    config system ha

    set arps 10

    end

  • Reduce the time between gratuitous arp packets. This may also caused connected network equipment to recognize the failover sooner. To send 50 gratuitous arp packets with 1 second between each packet:

    config system ha

    set arps 50

    set arps-interval 1

    end

  • Reduce the number of lost heartbeat packets and reduce the heartbeat interval timers to be able to more quickly detect a device failure. To set the lost heartbeat threshold to 3 packets and the heartbeat interval to 100 milliseconds:

    config system ha

    set hb-interval 1

    set hb-lost-threshold 3

    end

  • Reduce the hello state hold down time to reduce the amount of the time the cluster waits before transitioning from the hello to the work state. To set the hello state hold down time to 5 seconds:

    config system ha

    set hello-holddown 5

    end

  • Enable sending a link failed signal after a link failover to make sure that attached network equipment responds a quickly as possible to a link failure. To enable the link failed signal:

    config system ha

    set link-failed-signal enable

    end