What to do when coredump files are truncated or damaged
Sometimes you may find the size of a coredump file is 0, or obvious truncated stack information from the coredump file. It might mean the coredump file is truncated or damaged. To provide enough information to locate the root cause of a system/daemon crash, it’s necessary to resolve the problem and generate a complete coredump file.
1. Check if disk space (especially /var/log) is enough for generating/storing a coredump file:
/# df -h
Filesystem Size Used Available Use% Mounted on
/dev/root 472.5M 335.7M 136.8M 71% /
none 1.1G 116.0K 1.1G 0% /tmp
none 3.8G 2.5M 3.8G 0% /dev/shm
/dev/sdb1 362.4M 213.7M 129.1M 62% /data
/dev/sdb3 90.6M 56.0K 85.6M 0% /home
/dev/sda1 439.1G 7.5G 409.3G 2% /var/log
2. Check if the size of coredump file generated is very large - in older versions there is a limit of 50G for proxyd core files.
3. Check if there is any file system issue:
FortiWeb# execute fscklogdisk
This operation will fsck logdisk !
Do you want to continue? (y/n)y
enable-core-file to generate a complete coredump:
As mentioned in Checking core files and basic coredump information, this option is enabled on 7.0.3 and previous builds including 6.3.x, while disabled by default on 7.0.4 and later builds. If necessary coredump information cannot be collected in the stack information without a coredump file, it might be useful to enable this option to generate coredump files for further investigation.
By default, if the coredump file is very large (usually with a FortiWeb box with large memory size), the time used to generate the core file and write to disk might be very long (several minutes to more than 10 minutes). The negative impact is that a reboot will be triggered if the dump cannot be completed in 120s, and the daemon will not respond to new requests during this period.
On FortiWeb 6.3.15 and later releases, a new option
set enable-core-file is added. When this option is set, “
hung task timeout” will not take effect. That is to say, we can always expect the system to generate a complete coredump file. This option is useful to analyze a tough issue, though it may cause the service to stop responding for a long time. Also, in 6.3.15 and later releases, the 50G core size limit has been removed.
FortiWeb# config server-policy setting
FortiWeb(setting) # set enable-core-file #only works for proxyd
disable Disable coredump for proxyd.
enable Enable coredump action for proxyd, stop if coredump cannot finish in hung task timeout seconds.
enable-best-effort Enable coredump action for proxyd, stop until the entire core file is generated.
5. Other related configuration:
There are several other options related to coredump settings:
You can set the maximum daemon coredump files that can be stored to disk. If more core files are generated, the eldest one will be removed.
FortiWeb (setting) # set core-file-count
This command only works for daemon coredump file. For kernel core and core dump files, the limitation is fixed as: only 1 coredump files; up to 5 core files.
This limitation works separately for different daemons. For example, if the count is set as 3, then up to 3 corefumpe files for the daemon proxyd or ml_daemon is allowed. That is to say, a total of 6 coredump files can be allowed at the same time.
set corefile-ha-failover enable/disable for proxyd
This option is introduced from 7.0.4 and applies to HA scenarios. In the previous implementation, if a proxyd coredump occurs on the primary device in a HA group, HA failover will not happen because the heartbeat still works and all link status and priority do not change. However the current service will be interrupted until the crashed daemon restarts successfully.
With this option enabled, once the system has detected a proxyd coredump file generating process being started, HA failover will be triggered immediately, thus the service will be recovered much faster. In this situation the previous primary device can take more time to generate the coredump file without impacting the application traffic.
enable-core-fileneeds to be set as
FortiWeb # show server-policy setting
config server-policy setting
set enable-core-file enable #or enable-best-effort
set corefile-ha-failover enable
“set enable-core-file” and “set corefile-ha-failover” attributes will NOT be synchronized to other devices in the same HA group, so one needs to configure these configurations on each device if needed.
Currently only one daemon - proxyd coredump can trigger the corefile-ha-failover. Corefile-ha-failover will not be triggered by other daemons.
This function works in AP, AAS and AAHV modes, but is not suggested to be enabled in HA Manager modes in public clouds, because usually the load balancers in front of FortiWeb devices will do health checks and can guarantee that traffic is dispatched to the healthy nodes.
It is recommended just to enable this option on one FortiWeb, usually the primary device only. Otherwise a proxyd coredump that can happen on both devices may lead to HA failover back and forth between two devices.
Please refer to "How is FortiWeb appliance elected to be the primary node?" in FAQ for more detailed description of this feature.