Configuring ClickHouse Storage
This document describes the steps to configure ClickHouse storage. Depending on your ClickHouse cluster configuration, there are various possibilities:
- Storage may be required on Supervisor nodes and Worker nodes.
- Storage can be Online or Archived.
- Online storage can be tiered – Hot, Warm and Cold.
- Online storage can be:
- Direct attachment to the host or
- NFS mounted
- Archive storage can be AWS S3 or GCS.
Note that after configuring storage, you need to form the ClickHouse cluster by going to Admin > Settings > Database > ClickHouse Config.
Configuring ClickHouse Storage on Supervisor Node
Follow these steps:
- Navigate to Admin > Setup > Storage and click Online to choose storage.
- From the Event Database drop-down list, select ClickHouse.
- Setting Storage Tiers and disks for each tier.
- If your hardware model is 3600G or 2200G or 2000G, then Storage Tiers count is set to 2 and Hot Tier and Warm Tier disks are pre-configured. If you want to add to add more storage, then set the Storage Tiers to 3 and you can add NFS mounted storage in Cold Tier.
- If your hardware model is 3500G or 2000F or 500G, then Storage Tiers count is set to 1 and Hot Tier is pre-configured. If you want to add to add more storage, then set the Storage Tiers to 2 or 3 and you can add NFS mounted storage in Warm Tier or Cold Tier.
- If you are running on VM, then you can set Storage Tiers to 1, 2 or 3 and add disks to each Tier.
- To add a disk at any tier, click + and enter the Disk Path.
- To specify a locally attached Disk Path, run
lsblkcommand to find Disk Path, which should be of the form ‘/dev/<disk>’ - To specify a NFS mounted Disk Path, enter in the following format:
<NFS Server IP or HostName>:<exported mount point>.
Example:192.0.20.0:/mnt/warm
For more information, see steps 1 and 2 in the NFS Storage Guide.
Note: The mount point should be different for each Worker node and each tier, else the data is going to be overwritten.
- Click Test and if successful then Save.
Configuring ClickHouse Storage on Worker Node
The steps are described in Adding a Worker Node.
Configuring ClickHouse Archive Storage
For some situations, a large Cold Tier storage may suffice as archive storage. You can also add specialized archive storage in cloud environments:
Configuring AWS S3 for Archive
To configure AWS S3 for Archive, follow these steps:
- Go to Admin > Setup > Storage.
- Click Archive, and select AWS S3.
- For Credential Type, select Environmental Credentials or Explicit Credentials.
- If Environmental Credentials is selected, you will need to have an Identity and Access Management. Follow the instructions in Creating IAM Policy for AWS S3 Explicit Credentials to create an IAM Policy
- If Explicit Credentials is selected, then enter the following information:
- Access Key ID: Access Key ID required to access the S3 bucket(s)
- Secret Access Key: The Secret Access Key associated with the Access Key ID to access the S3 bucket(s)
- For Buckets:
- In the Bucket field, enter the bucket URL.
- In the Region field, enter the region. For example, "us-east-1".
Note: To minimize any latency, enter the closest region. - If more Buckets are required, click + to add a new row.
- Click Test.
- If the test succeeds, click Deploy.
- Configure each ClickHouse Worker to use the configured S3 bucket.
- Navigate to Admin > License > Nodes, edit each Worker, check AWS S3 and choose the Bucket from the drop-down.
- Click Test, and if the test succeeds, click Deploy.
- If the Supervisor is used as ClickHouse node, take the following steps:
- Navigate to Admin > Setup > Storage, click Online, check AWS S3 and choose the Bucket from the drop-down.
- Click Test, and if the test succeeds, click Deploy.
- Apply AWS S3 as the new storage policy to the ClickHouse cluster by taking the following steps.
- Navigate to Admin > Settings > Database > ClickHouse Config.
- Add the AWS S3 bucket(s) to your ClickHouse Cluster Configuration using the appropriate Shard # > Replica # drop-down list.
- Click Test, and if the test succeeds, click Deploy.
Implementation Notes:
- AWS S3 buckets MUST be created prior to this configuration.
- When storing ClickHouse data in AWS S3, Fortinet recommends turning Bucket Versioning off, or suspending it (if it was previously enabled). This is because data in ClickHouse files may change and versioning will keep both copies of data - new and old. With time, the number of stale objects may increase, resulting in higher AWS S3 costs. If versioning was previously enabled for the bucket, Fortinet recommends suspending it and configuring a policy to delete non-current versions.
- Archive data will NOT be automatically purged by FortiSIEM or ClickHouse.
- S3 archive folder will not be generated until the worker performs its first archive into S3.
Configuring GCS for Archive
To configure GCS for Archive, follow these steps:
- Go to Admin > Setup > Storage.
- Click Archive, and select GCS.
- Enter the following information:
- Access Key ID: Access Key ID required to access the GCS bucket(s)
- Secret Access Key: The Secret Access Key associated with the Access Key ID to access the GCS bucket(s)
Note: See Google IAM documentation here for more information about keys.
- For Buckets:
- In the Bucket field, enter the bucket.
- If more Buckets are required, click + to add a new row.
Note: See Google Cloud Storage documentation here for more information about buckets.
- Click Test.
- If the test succeeds, click Deploy.
- Configure each ClickHouse Worker to use the configured GCS bucket.
- Navigate to Admin > License > Nodes, edit each Worker, check Archive GCS and choose the Bucket from the GCS Bucket drop-down.
- Click Test, and if the test succeeds, click Deploy.
- If the Supervisor is used as ClickHouse node, take the following steps:
- Navigate to Admin > Setup > Storage, click Online, check Archive GCS and choose the Bucket from the GCS Bucket drop-down.
- Click Test, and if the test succeeds, click Deploy.
- Apply GCS as the new storage policy to the ClickHouse cluster by taking the following steps.
- Navigate to Admin > Settings > Database > ClickHouse Config.
- Add the GCS bucket(s) to your ClickHouse Cluster Configuration by using the appropriate Shard # > Replica # drop-down list.
- Click Test, and if the test succeeds, click Deploy.
Implementation Notes:
- GCS buckets MUST be created prior to this configuration.
- When storing ClickHouse data in GCS, Fortinet recommends turning Bucket Versioning off, or suspending it (if it was previously enabled). This is because data in ClickHouse files may change and versioning will keep both copies of data - new and old. With time, the number of stale objects may increase, resulting in higher GCS costs. If versioning was previously enabled for the bucket, Fortinet recommends suspending it and configuring a policy to delete non-current versions.
- Archive data will NOT be automatically purged by FortiSIEM or ClickHouse.