Fortinet black logo

KVM Administration Guide

Tuned indirect update

Copy Link
Copy Doc ID 40495042-8674-11eb-9995-00505692583a:544269
Download PDF

Tuned indirect update

You can use tuned for CPU partitioning: isolating CPUs from use by the Linux OS, except a CPU usage measurement made by the kernel every second. You can find more information at TUNED_PROFILES_CPU_PARTITIONING(7). Essentially, you must optimize this on the hardware, depending on number of CPUs and per NUMA node.

The following provides an example of the considerations, using the Dell PowerEdge R740 as an example:

[root@rhel-tiger-14-6 ~]# lscpu egrep "^(NUMA |Socket|Core|Thread)"
Thread(s) per core:  2
Core(s) per socket:  18
Socket(s):           2
NUMA node(s):        2
NUMA node0 CPU(s):   0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70
NUMA node1 CPU(s):   1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71

[root@rhel-tiger-14-6 ~]# tuned-adm active
Current active profile: throughput-performance

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor wc -l
72

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor sort -u
performance

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq sort -u
3700000

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq sort -u
1200000

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq sort -u
cat: '/sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq': No such file or directory

In this case, there are two NUMA nodes with one socket each, equating to 36 vCPUs per NUMA. Considering the FortiGate-VM sizing options, using 32 vCPUs per NUMA for VMs makes sense. This leaves four vCPUs for the host machine to use for housekeeping.

The current tuned profile is for throughout performance, which has some relevance to the CPU scaling governor settings. You can reperform these commands once you have configured tuned as desired and the outputs here are for comparison. The current frequency is not set in this case:

[root@rhel-tiger-14-6 ~]# yum -y install tuned-profiles-cpu-partitioning
<output omitted for brevity>

[root@rhel-tiger-14-6 ~]# touch /etc/tuned/cpu-partitioning-variables.conf

[root@rhel-tiger-14-6 ~]# chown root:root /etc/tuned/cpu-partitioning-variables.conf

[root@rhel-tiger-14-6 ~]# chmod 644 /etc/tuned/cpu-partitioning-variables.conf

[root@rhel-tiger-14-6 ~]# cat /etc/tuned/cpu-partitioning-variables.conf
# Examples:
# isolated_cores=2,4-7
# isolated_cores=2-23
#
# To disable the kernel load balancing in certain isolated CPUs:
# no_balance_cores=5-10
isolated_cores=4-35,40-71
no_balance_cores=4-35,40-71

[root@rhel-tiger-14-6 ~]# tuned-adm profile cpu-partitioning

[root@rhel-tiger-14-6 ~]# tuned-adm active
Current active profile: cpu-partitioning

[root@rhel-tiger-14-6 ~]# reboot

In this example, vCPUs 0-3 and 36-39 are not declared and are the housekeeping resources, while the others are used in VMs.

When the tuned profile is activated, changes are embedded itself into the kernel command line via GRUB:

[root@rhel-tiger-14-6 ~]# cat /proc/cmdline 
BOOT_IMAGE=(hd1,gpt2)/vmlinuz-4.18.0-305.25.1.el8_4.x86_64 root=/dev/mapper/vg1-root ro crashkernel=auto resume=/dev/mapper/vg1-swap rd.lvm.lv=vg1/root rd.lvm.lv=vg1/swap rhgb quiet intel-iommu=on iommu=pt hugepagesz=1G default_hugepagesz=1G hugepages=160 transparent_hugepage=never selinux=0 skew_tick=1 nohz=on nohz_full=4-35,40-71 rcu_nocbs=4-35,40-71 tuned.non_isolcpus=000000f0,0000000f intel_pstate=disable nosoftlockup

tuned has added the following parameters:

Parameter

Description

skew_tick=1

Ensures that the ticks per CPU do not occur simultaneously by skewing their start times. Skewing the start times of the per-CPU timer ticks decreases the potential for lock conflicts, reducing system jitter for interrupt response times.

nohz=on

Turns off the timer tick on an idle CPU.

nohz_full=4-35,40-71

Turns off the timer tick on a CPU when there is only one runnable task on that CPU. Needs nohz to be set to on.

rcu_nocbs=4-35,40-71

To allow the user to move all RCU offload threads to a housekeeping CPU.

tuned.non_isolcpus=000000f0,0000000f

The CPU mask of the CPUs left for the host to use, in our example 000000f0,0000000f => 0x000000F00000000F => CPUs 0-3, 36-39 (c.f.

CPU Affinity Calculator).

intel_pstate=disable

Prevents the Intel idle driver from managing power state and CPU frequency.

nosoftlockup

Prevents the kernel from detecting soft lockups in user threads.

The isolcpus parameter is considered deprecated. Instead, tuned is using CPUsets/affinity to partition CPUs. It does what it can to keep the kernel noise/housekeeping away from CPUs that are intended to use for VMs.

Using the lowest number physical cores for housekeeping is good practice. You can use lstopo-no-graphics to ensure that the appropriate ones are selected. The following shows a snippet of the lstopo-no-graphics output:

[root@rhel-tiger-14-6 ~]# lstopo-no-graphics
Machine (187GB total)
  Package L#0
    NUMANode L#0 (P#0 93GB)
    L3 L#0 (25MB)
      L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
        PU L#0 (P#0)
        PU L#1 (P#36)
      L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
        PU L#2 (P#2)
        PU L#3 (P#38)
      L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
        PU L#4 (P#4)
        PU L#5 (P#40)
<output omitted for brevity>
  Package L#1
    NUMANode L#1 (P#1 94GB)
    L3 L#1 (25MB)
      L2 L#18 (1024KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18
        PU L#36 (P#1)
        PU L#37 (P#37)
      L2 L#19 (1024KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19
        PU L#38 (P#3)
        PU L#39 (P#39)
      L2 L#20 (1024KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20
        PU L#40 (P#5)
        PU L#41 (P#41)
<output omitted for brevity>
  • Core L#0 = Physical Core with hwloc index 0
  • PU L#0 (P#0) = Processing Unit with hwloc index 0: processor 0
  • PU L#1 (P#36) = Processing Unit with hwloc index 1: processor 36

You can see how the vCPU number relates to the physical core. In this case, physical core 0 has two threads identified as vCPU 0 and vCPU 36.

tuned has also taken care of the CPU scaling governor. Clock scaling allows changing the CPU clock speed on the fly between a minimum and maximum value. This is good for some compute hardware but for a performant system, the CPU must work at the maximum frequency it can. It must be in performance mode:

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor wc -l
72

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor sort -u
performance

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq sort -u
3001000

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq sort -u
1200000

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq sort -u
3001000

Change is due to the addition of intel_pstate=disable to the kernel command line. You no longer overclock the CPU. Overclocking is not advised as it makes the performance less predictable. When using vSPU, overclocking leads to the CPUs being permanently overclocked, which could lead to problems such as overheating.

In the case of the aforementioned outputs, you can see that all 72 CPUs have been set to performance mode. All CPUs are operating at the maximum frequency that they can support without overclocking. You configure this setting per CPU and should configure all CPUs used for the FortiGate-VM this way as a minimum requirement.

Tuned indirect update

You can use tuned for CPU partitioning: isolating CPUs from use by the Linux OS, except a CPU usage measurement made by the kernel every second. You can find more information at TUNED_PROFILES_CPU_PARTITIONING(7). Essentially, you must optimize this on the hardware, depending on number of CPUs and per NUMA node.

The following provides an example of the considerations, using the Dell PowerEdge R740 as an example:

[root@rhel-tiger-14-6 ~]# lscpu egrep "^(NUMA |Socket|Core|Thread)"
Thread(s) per core:  2
Core(s) per socket:  18
Socket(s):           2
NUMA node(s):        2
NUMA node0 CPU(s):   0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70
NUMA node1 CPU(s):   1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71

[root@rhel-tiger-14-6 ~]# tuned-adm active
Current active profile: throughput-performance

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor wc -l
72

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor sort -u
performance

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq sort -u
3700000

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq sort -u
1200000

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq sort -u
cat: '/sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq': No such file or directory

In this case, there are two NUMA nodes with one socket each, equating to 36 vCPUs per NUMA. Considering the FortiGate-VM sizing options, using 32 vCPUs per NUMA for VMs makes sense. This leaves four vCPUs for the host machine to use for housekeeping.

The current tuned profile is for throughout performance, which has some relevance to the CPU scaling governor settings. You can reperform these commands once you have configured tuned as desired and the outputs here are for comparison. The current frequency is not set in this case:

[root@rhel-tiger-14-6 ~]# yum -y install tuned-profiles-cpu-partitioning
<output omitted for brevity>

[root@rhel-tiger-14-6 ~]# touch /etc/tuned/cpu-partitioning-variables.conf

[root@rhel-tiger-14-6 ~]# chown root:root /etc/tuned/cpu-partitioning-variables.conf

[root@rhel-tiger-14-6 ~]# chmod 644 /etc/tuned/cpu-partitioning-variables.conf

[root@rhel-tiger-14-6 ~]# cat /etc/tuned/cpu-partitioning-variables.conf
# Examples:
# isolated_cores=2,4-7
# isolated_cores=2-23
#
# To disable the kernel load balancing in certain isolated CPUs:
# no_balance_cores=5-10
isolated_cores=4-35,40-71
no_balance_cores=4-35,40-71

[root@rhel-tiger-14-6 ~]# tuned-adm profile cpu-partitioning

[root@rhel-tiger-14-6 ~]# tuned-adm active
Current active profile: cpu-partitioning

[root@rhel-tiger-14-6 ~]# reboot

In this example, vCPUs 0-3 and 36-39 are not declared and are the housekeeping resources, while the others are used in VMs.

When the tuned profile is activated, changes are embedded itself into the kernel command line via GRUB:

[root@rhel-tiger-14-6 ~]# cat /proc/cmdline 
BOOT_IMAGE=(hd1,gpt2)/vmlinuz-4.18.0-305.25.1.el8_4.x86_64 root=/dev/mapper/vg1-root ro crashkernel=auto resume=/dev/mapper/vg1-swap rd.lvm.lv=vg1/root rd.lvm.lv=vg1/swap rhgb quiet intel-iommu=on iommu=pt hugepagesz=1G default_hugepagesz=1G hugepages=160 transparent_hugepage=never selinux=0 skew_tick=1 nohz=on nohz_full=4-35,40-71 rcu_nocbs=4-35,40-71 tuned.non_isolcpus=000000f0,0000000f intel_pstate=disable nosoftlockup

tuned has added the following parameters:

Parameter

Description

skew_tick=1

Ensures that the ticks per CPU do not occur simultaneously by skewing their start times. Skewing the start times of the per-CPU timer ticks decreases the potential for lock conflicts, reducing system jitter for interrupt response times.

nohz=on

Turns off the timer tick on an idle CPU.

nohz_full=4-35,40-71

Turns off the timer tick on a CPU when there is only one runnable task on that CPU. Needs nohz to be set to on.

rcu_nocbs=4-35,40-71

To allow the user to move all RCU offload threads to a housekeeping CPU.

tuned.non_isolcpus=000000f0,0000000f

The CPU mask of the CPUs left for the host to use, in our example 000000f0,0000000f => 0x000000F00000000F => CPUs 0-3, 36-39 (c.f.

CPU Affinity Calculator).

intel_pstate=disable

Prevents the Intel idle driver from managing power state and CPU frequency.

nosoftlockup

Prevents the kernel from detecting soft lockups in user threads.

The isolcpus parameter is considered deprecated. Instead, tuned is using CPUsets/affinity to partition CPUs. It does what it can to keep the kernel noise/housekeeping away from CPUs that are intended to use for VMs.

Using the lowest number physical cores for housekeeping is good practice. You can use lstopo-no-graphics to ensure that the appropriate ones are selected. The following shows a snippet of the lstopo-no-graphics output:

[root@rhel-tiger-14-6 ~]# lstopo-no-graphics
Machine (187GB total)
  Package L#0
    NUMANode L#0 (P#0 93GB)
    L3 L#0 (25MB)
      L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
        PU L#0 (P#0)
        PU L#1 (P#36)
      L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
        PU L#2 (P#2)
        PU L#3 (P#38)
      L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
        PU L#4 (P#4)
        PU L#5 (P#40)
<output omitted for brevity>
  Package L#1
    NUMANode L#1 (P#1 94GB)
    L3 L#1 (25MB)
      L2 L#18 (1024KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18
        PU L#36 (P#1)
        PU L#37 (P#37)
      L2 L#19 (1024KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19
        PU L#38 (P#3)
        PU L#39 (P#39)
      L2 L#20 (1024KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20
        PU L#40 (P#5)
        PU L#41 (P#41)
<output omitted for brevity>
  • Core L#0 = Physical Core with hwloc index 0
  • PU L#0 (P#0) = Processing Unit with hwloc index 0: processor 0
  • PU L#1 (P#36) = Processing Unit with hwloc index 1: processor 36

You can see how the vCPU number relates to the physical core. In this case, physical core 0 has two threads identified as vCPU 0 and vCPU 36.

tuned has also taken care of the CPU scaling governor. Clock scaling allows changing the CPU clock speed on the fly between a minimum and maximum value. This is good for some compute hardware but for a performant system, the CPU must work at the maximum frequency it can. It must be in performance mode:

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor wc -l
72

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor sort -u
performance

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq sort -u
3001000

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq sort -u
1200000

[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq sort -u
3001000

Change is due to the addition of intel_pstate=disable to the kernel command line. You no longer overclock the CPU. Overclocking is not advised as it makes the performance less predictable. When using vSPU, overclocking leads to the CPUs being permanently overclocked, which could lead to problems such as overheating.

In the case of the aforementioned outputs, you can see that all 72 CPUs have been set to performance mode. All CPUs are operating at the maximum frequency that they can support without overclocking. You configure this setting per CPU and should configure all CPUs used for the FortiGate-VM this way as a minimum requirement.