Viewing bot detection model status
This option is enabled by default. It appears only when the model is in Ready status.
There are four status: Collecting, Building, Ready, Failure.
- Collecting: The system is collecting samples.
- Building: The system is building bot detection model.
- Ready: The model is ready to run. You can use the Model Detection option to run or stop the model.
- Failure: The model fails to be built. You can check the log messages to get more information on the failure reasons and adjust the settings in the bot detection policy accordingly. The following is an example of the log message:
Model status changed from Building to Failure by FortiWeb daemon. Failed to create model. Could not build a model required by Model Settings. Please adjust the Model Building Settings to make sure Training Accuracy is lower 98.2222%, Cross Validation is lower than 99.1111% and Test Accuracy is lower than 97.3333%.
- Rebuild: The system rebuilds the model using the existing samples. This option is useful when the policy settings are changed, so that the bot detection model should be rebuilt with the adjusted settings.
- Refresh: The system re-collects samples, and then re-builds the model. This option is useful when you think the model is not accurate, and you want to re-collect samples and re-build the model. Also keep in mind to use the Dynamically Update Model option in the bot detection policy to automatically refresh the model when too many false positive vectors are detected.
The Model Information section displays the anomalies detected in the Training Set and Test Set. You can switch between the Moderate Model and Strict Model.
For example, the following figure shows 1 anomaly is detected in the Training Set using the Moderate Model. The Training Accuracy of the Moderate Model is 99.73%; the Testing Accuracy is 100%; the Cross Validation value is 98.67%. The red line represents the Anomaly. You can hover the mouse over this line to see the values for each dimension.
The bot detection model evaluates users' behaviors in the following dimensions:
- TCP connection
The created TCP connections during the sampling period. Bot like DoS tools and scanners always creates many more TCP connections than regular clients.
- HTTP request
The triggered HTTP requests during the sampling time. Bot always triggers many more HTTP requests than regular clients.
- HTTP HEAD methods
The triggered HTTP requests whose method is HEAD. Crawlers and scanners always use HTTP HEAD method, while the regular clients don’t.
- HTTP error responses
The triggered HTTP error responses whose HTTP return code is larger than 400. Scanners always trigger HTTP error responses.
- HTTP requests without Referers
The HTTP requests that don’t have the Referer header field. Regular web access always includes the HTTP header field, while the requests from the bot like scrappers may not include this header field.
- HTTP requests without User-Agent
The HTTP requests that don’t have the User-Agent HTTP header field. Bot like DoS tools triggers HTTP traffic without the User-Agent.
- HTTP requests with illegal HTTP version
The HTTP requests that use non HTTP1.1/2.0 HTTP versions. Bot like scanners triggers HTTP traffic using HTTP 0.9/HTTP 1.0 HTTP versions.
- HTML pages
The HTTP requests that access the HTML pages. Regular web access always triggers this kind of requests, while Bot like scrappers may not. Scrappers tend to fetch pure site data like commodity price.
- JSON/XML resources
The HTTP requests that access the JSON/XML resources. Bot like scrappers always triggers huge amount of this kind of requests.
- Request for robots.txt
The HTTP requests for file robots.txt. Bot like known engines and crawlers usually attempts to fetch the file, while the regular clients don’t.
- Seconds with throughput
The traffic triggered by regular clients usually doesn't last long, while the traffic from bot is always across the whole sampling time period.
- Average duration with throughput
The duration time of regular clients is always much shorter than that of bots.
The Model Statistics shows the Traffic Trend (the green line), the Anomaly Trend (the orange line), and the Confirmed Bots (the blue line).
Provided there were plenty of vectors collected in the past 24 hours (Traffic Trend), if the gap between the Anomaly Trend and the Confirmed Bots is continuously wide, it means the current bot detection model may need to be refreshed, because many false positive vectors are detected.