Starting with the 6.0 release, FortiWeb offers a machine-learning function that enables it to automatically detect malicious web traffic and bots . In addition to detecting known attacks, the feature can detect potential unknown zero-day attacks to provide real-time protection for web servers.
Machine Learning is intended to replace Auto Learn, which is now removed from 6.1 release.
The anomaly detection model of machine learning feature observes the URLs, parameters, and HTTP Method of HTTP and/or HTTPS sessions passing to your web servers. It builds mathematical models to detect abnormal traffic. To learn about whether a request is legitimate or a potential malicious attack attempt, it performs the following tasks:
- Captures and collects inputs, such as URL parameters, to build a mathematical model of allowed access
- Observes the HTTP method of the traffic
- Matches anomalies against pre-trained threat models
- Detects attacks
FortiWeb employs two layers of machine learning to detect malicious attacks. The first layer uses the Hidden Markov Model (HMM) and monitors access to the application and collects data to build a mathematical model behind every parameter and HTTP method. Once completed, it will verify every request against the model to determine whether it's an anomaly or not.
Once the first layer of machine learning triggers a request as an anomaly, FortiWeb will use the second layer of machine learning to verify whether it's a real attack or just a benign anomaly that should be ignored. To do so, FortiWeb includes pre-built trained threat models. Each represents a certain attack category, such as SQL Injection, Cross-site Scripting, and so on. Each threat model is already trained based on analysis of thousands of attack samples. Threat models are continuously updated using the FortiWeb Security Service. When new attack types are released, the FortiGuard team analyzes the new threats and re-trains the relevant threat model. The new threat model is then pushed to all customer installations in a way similar to how signatures are updated.
See Configuring anomaly detection policy for more information.
The AI-based machine learning bot detection model complements the existing signature and threshold based rules. It detects sophisticated bots that can sometimes go undetected. The bot detection model observes user behaviors from thirteen dimensions, for example, how many times of HTTP requests are initiated by the user, whether the request uses illegal HTTP versions, whether it fetches JSON/XML resources, etc.
Compared with the traditional mechanisms to detect bots, the bot detection model saves you the trouble to experiment on an appropriate threshold to detect abnormal user behaviors. For example, how could you know how many times of HTTP requests initiated by a user should be considered as abnormal? With the traditional mechanism, you may need to experiment on different threshold values and continuously check the attack log until no related attack logs are reported for the regular traffic.
Things are much easier if you use the bot detection model. FortiWeb uses SVM (Support Vector Machine) algorithm to build up the bot detection model that self-learns the traffic profiles of regular clients. When the traffic from a new client flows in, it is compared against that of the regular clients. If they don't match, the bot detection model classifies the new client as an anomaly. When the traffic profiles of the regular clients vary dramatically (e.g. the functions of your application have changed, so that users behave differently when they visit your application), FortiWeb automatically refreshes the bot detection model to adapt to the changes.
Moreover, test shows that the bot detection model performs much better, specially when it detects crawlers and scrapers. The traffic is comprehensively evaluated from 13 dimensions. It helps increase the detection accuracy and decrease the false positive rate.
See Configuring bot detection profiles for more information.
Machine learning is not fully supported when FortiWeb is deployed in active-active HA mode. It doesn't work on the secondary node; On the master node, it works but not always stable, for example, after system reboot or HA role switch, the machine learning may stop working.