Fortinet black logo

KVM Administration Guide

CPU pinning

Copy Link
Copy Doc ID 00804af1-a935-11ec-9fd1-fa163e15d75b:349286
Download PDF

CPU pinning

vcpu represents the vCPU allocated and seen by the guest. cpuset represents the physical CPU thread. Comparing this to the earlier print in Tuned indirect update, only the CPUs from one NUMA node are selected, matching the memory, and this should correlate to the NIC.

emulatorpin specifies that the emulator is pinned to all vCPUs in the definition.

<vcpu placement='static'>32</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='5'/>
    <vcpupin vcpu='1' cpuset='41'/>
    <vcpupin vcpu='2' cpuset='7'/>
    <vcpupin vcpu='3' cpuset='43'/>
    <vcpupin vcpu='4' cpuset='9'/>
    <vcpupin vcpu='5' cpuset='45'/>
    <vcpupin vcpu='6' cpuset='11'/>
    <vcpupin vcpu='7' cpuset='47'/>
    <vcpupin vcpu='8' cpuset='13'/>
    <vcpupin vcpu='9' cpuset='49'/>
    <vcpupin vcpu='10' cpuset='15'/>
    <vcpupin vcpu='11' cpuset='51'/>
    <vcpupin vcpu='12' cpuset='17'/>
    <vcpupin vcpu='13' cpuset='53'/>
    <vcpupin vcpu='14' cpuset='19'/>
    <vcpupin vcpu='15' cpuset='55'/>
    <vcpupin vcpu='16' cpuset='21'/>
    <vcpupin vcpu='17' cpuset='57'/>
    <vcpupin vcpu='18' cpuset='23'/>
    <vcpupin vcpu='19' cpuset='59'/>
    <vcpupin vcpu='20' cpuset='25'/>
    <vcpupin vcpu='21' cpuset='61'/>
    <vcpupin vcpu='22' cpuset='27'/>
    <vcpupin vcpu='23' cpuset='63'/>
    <vcpupin vcpu='24' cpuset='29'/>
    <vcpupin vcpu='25' cpuset='65'/>
    <vcpupin vcpu='26' cpuset='31'/>
    <vcpupin vcpu='27' cpuset='67'/>
    <vcpupin vcpu='28' cpuset='33'/>
    <vcpupin vcpu='29' cpuset='69'/>
    <vcpupin vcpu='30' cpuset='35'/>
    <vcpupin vcpu='31' cpuset='71'/>
    <emulatorpin cpuset='5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71'/>
  </cputune>

When considering CPU pinning for the best performance, hyperthreading may be a concern. Refer again to the lstopo-no-graphics output:

[root@rhel-tiger-14-6 ~]# lstopo-no-graphics
 Machine (187GB total)
 <output omitted for brevity>
   Package L#1
     NUMANode L#1 (P#1 94GB)
     L3 L#1 (25MB)
       L2 L#18 (1024KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18
         PU L#36 (P#1)
         PU L#37 (P#37)
       L2 L#19 (1024KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19
         PU L#38 (P#3)
         PU L#39 (P#39)
       L2 L#20 (1024KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20
         PU L#40 (P#5)
         PU L#41 (P#41)
       L2 L#21 (1024KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21
         PU L#42 (P#7)
         PU L#43 (P#43)
       L2 L#22 (1024KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22
         PU L#44 (P#9)
         PU L#45 (P#45)
       L2 L#23 (1024KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23
         PU L#46 (P#11)
         PU L#47 (P#47)
       L2 L#24 (1024KB) + L1d L#24 (32KB) + L1i L#24 (32KB) + Core L#24
         PU L#48 (P#13)
         PU L#49 (P#49)
       L2 L#25 (1024KB) + L1d L#25 (32KB) + L1i L#25 (32KB) + Core L#25
         PU L#50 (P#15)
         PU L#51 (P#51)
       L2 L#26 (1024KB) + L1d L#26 (32KB) + L1i L#26 (32KB) + Core L#26
         PU L#52 (P#17)
         PU L#53 (P#53)
       L2 L#27 (1024KB) + L1d L#27 (32KB) + L1i L#27 (32KB) + Core L#27
         PU L#54 (P#19)
         PU L#55 (P#55)
       L2 L#28 (1024KB) + L1d L#28 (32KB) + L1i L#28 (32KB) + Core L#28
         PU L#56 (P#21)
         PU L#57 (P#57)
       L2 L#29 (1024KB) + L1d L#29 (32KB) + L1i L#29 (32KB) + Core L#29
         PU L#58 (P#23)
         PU L#59 (P#59)
       L2 L#30 (1024KB) + L1d L#30 (32KB) + L1i L#30 (32KB) + Core L#30
         PU L#60 (P#25)
         PU L#61 (P#61)
       L2 L#31 (1024KB) + L1d L#31 (32KB) + L1i L#31 (32KB) + Core L#31
         PU L#62 (P#27)
         PU L#63 (P#63)
       L2 L#32 (1024KB) + L1d L#32 (32KB) + L1i L#32 (32KB) + Core L#32
         PU L#64 (P#29)
         PU L#65 (P#65)
       L2 L#33 (1024KB) + L1d L#33 (32KB) + L1i L#33 (32KB) + Core L#33
         PU L#66 (P#31)
         PU L#67 (P#67)
       L2 L#34 (1024KB) + L1d L#34 (32KB) + L1i L#34 (32KB) + Core L#34
         PU L#68 (P#33)
         PU L#69 (P#69)
       L2 L#35 (1024KB) + L1d L#35 (32KB) + L1i L#35 (32KB) + Core L#35
         PU L#70 (P#35)
         PU L#71 (P#71)
 <output omitted for brevity>

To get the best performance from a vCPU, it may be necessary to avoid using the second thread on the physical core. The lstopo-no-graphics output indicates that the hyperthreading causes two vCPUs to share the same L1 and L2 cache. For example, vCPUs 5 and 41 are sharing. This may lead to contention and performance degradation.

However, using the second core to increase the number of vCPUs is better than not providing it to the VM. The consideration is thus using the FortiGate-VM license effectively across the hardware.

Note

The L3 cache is shared across the whole NUMA node in this case, so there is potential for contention there.

CPU pinning

vcpu represents the vCPU allocated and seen by the guest. cpuset represents the physical CPU thread. Comparing this to the earlier print in Tuned indirect update, only the CPUs from one NUMA node are selected, matching the memory, and this should correlate to the NIC.

emulatorpin specifies that the emulator is pinned to all vCPUs in the definition.

<vcpu placement='static'>32</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='5'/>
    <vcpupin vcpu='1' cpuset='41'/>
    <vcpupin vcpu='2' cpuset='7'/>
    <vcpupin vcpu='3' cpuset='43'/>
    <vcpupin vcpu='4' cpuset='9'/>
    <vcpupin vcpu='5' cpuset='45'/>
    <vcpupin vcpu='6' cpuset='11'/>
    <vcpupin vcpu='7' cpuset='47'/>
    <vcpupin vcpu='8' cpuset='13'/>
    <vcpupin vcpu='9' cpuset='49'/>
    <vcpupin vcpu='10' cpuset='15'/>
    <vcpupin vcpu='11' cpuset='51'/>
    <vcpupin vcpu='12' cpuset='17'/>
    <vcpupin vcpu='13' cpuset='53'/>
    <vcpupin vcpu='14' cpuset='19'/>
    <vcpupin vcpu='15' cpuset='55'/>
    <vcpupin vcpu='16' cpuset='21'/>
    <vcpupin vcpu='17' cpuset='57'/>
    <vcpupin vcpu='18' cpuset='23'/>
    <vcpupin vcpu='19' cpuset='59'/>
    <vcpupin vcpu='20' cpuset='25'/>
    <vcpupin vcpu='21' cpuset='61'/>
    <vcpupin vcpu='22' cpuset='27'/>
    <vcpupin vcpu='23' cpuset='63'/>
    <vcpupin vcpu='24' cpuset='29'/>
    <vcpupin vcpu='25' cpuset='65'/>
    <vcpupin vcpu='26' cpuset='31'/>
    <vcpupin vcpu='27' cpuset='67'/>
    <vcpupin vcpu='28' cpuset='33'/>
    <vcpupin vcpu='29' cpuset='69'/>
    <vcpupin vcpu='30' cpuset='35'/>
    <vcpupin vcpu='31' cpuset='71'/>
    <emulatorpin cpuset='5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71'/>
  </cputune>

When considering CPU pinning for the best performance, hyperthreading may be a concern. Refer again to the lstopo-no-graphics output:

[root@rhel-tiger-14-6 ~]# lstopo-no-graphics
 Machine (187GB total)
 <output omitted for brevity>
   Package L#1
     NUMANode L#1 (P#1 94GB)
     L3 L#1 (25MB)
       L2 L#18 (1024KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18
         PU L#36 (P#1)
         PU L#37 (P#37)
       L2 L#19 (1024KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19
         PU L#38 (P#3)
         PU L#39 (P#39)
       L2 L#20 (1024KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20
         PU L#40 (P#5)
         PU L#41 (P#41)
       L2 L#21 (1024KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21
         PU L#42 (P#7)
         PU L#43 (P#43)
       L2 L#22 (1024KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22
         PU L#44 (P#9)
         PU L#45 (P#45)
       L2 L#23 (1024KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23
         PU L#46 (P#11)
         PU L#47 (P#47)
       L2 L#24 (1024KB) + L1d L#24 (32KB) + L1i L#24 (32KB) + Core L#24
         PU L#48 (P#13)
         PU L#49 (P#49)
       L2 L#25 (1024KB) + L1d L#25 (32KB) + L1i L#25 (32KB) + Core L#25
         PU L#50 (P#15)
         PU L#51 (P#51)
       L2 L#26 (1024KB) + L1d L#26 (32KB) + L1i L#26 (32KB) + Core L#26
         PU L#52 (P#17)
         PU L#53 (P#53)
       L2 L#27 (1024KB) + L1d L#27 (32KB) + L1i L#27 (32KB) + Core L#27
         PU L#54 (P#19)
         PU L#55 (P#55)
       L2 L#28 (1024KB) + L1d L#28 (32KB) + L1i L#28 (32KB) + Core L#28
         PU L#56 (P#21)
         PU L#57 (P#57)
       L2 L#29 (1024KB) + L1d L#29 (32KB) + L1i L#29 (32KB) + Core L#29
         PU L#58 (P#23)
         PU L#59 (P#59)
       L2 L#30 (1024KB) + L1d L#30 (32KB) + L1i L#30 (32KB) + Core L#30
         PU L#60 (P#25)
         PU L#61 (P#61)
       L2 L#31 (1024KB) + L1d L#31 (32KB) + L1i L#31 (32KB) + Core L#31
         PU L#62 (P#27)
         PU L#63 (P#63)
       L2 L#32 (1024KB) + L1d L#32 (32KB) + L1i L#32 (32KB) + Core L#32
         PU L#64 (P#29)
         PU L#65 (P#65)
       L2 L#33 (1024KB) + L1d L#33 (32KB) + L1i L#33 (32KB) + Core L#33
         PU L#66 (P#31)
         PU L#67 (P#67)
       L2 L#34 (1024KB) + L1d L#34 (32KB) + L1i L#34 (32KB) + Core L#34
         PU L#68 (P#33)
         PU L#69 (P#69)
       L2 L#35 (1024KB) + L1d L#35 (32KB) + L1i L#35 (32KB) + Core L#35
         PU L#70 (P#35)
         PU L#71 (P#71)
 <output omitted for brevity>

To get the best performance from a vCPU, it may be necessary to avoid using the second thread on the physical core. The lstopo-no-graphics output indicates that the hyperthreading causes two vCPUs to share the same L1 and L2 cache. For example, vCPUs 5 and 41 are sharing. This may lead to contention and performance degradation.

However, using the second core to increase the number of vCPUs is better than not providing it to the VM. The consideration is thus using the FortiGate-VM license effectively across the hardware.

Note

The L3 cache is shared across the whole NUMA node in this case, so there is potential for contention there.