Skip to content

Incorrect CPU topology on Single NUMA and Multi socket system leads to performance degradation for POD #2798

Closed
@hanamantagoudvk

Description

Issue and Impact seen:

Due to incorrect CPU topology generated by cadvisor (version 0.37), CNF Pod is allocated CPU 1 which is a hyperthread sibling of CPU 0 (CPU 0 hosts most of the OS and other components). This leads performance degradation of the Pod.
In earlier versions (k8s 1.18.9, cadvisor 0.35.0) , CPU allocation happened from CPU 2-14 , whereas in k8s version 1.19.3 , cadvisor version 0.37.0 , CPU allocation happens from CPU 1-13.

Issue detected version:

k8s version 1.19.3 , cadvisor version 0.37.0

Version where it works well:

k8s version 1.18.9, cadvisor version 0.35.0

Analysis done so far:

  1. We see this issue on Singel NUMA , Multi Socket System

  2. On version 0.35.0 , topology generated by cadvisor and fed to kubelet looks like below.
    Here there is a clear mapping of node/socket to cores/threads.

     "topology": [
       {
         "node_id": 0,
         "memory": 50628362240,
         "hugepages": [
           {
             "page_size": 1048576,
             "num_pages": 8
           }
         ],
         "cores": [
           {
             "core_id": 0,
             "thread_ids": [
               0,
               1
             ],
             "caches": [
               {
                 "size": 4194304,
                 "type": "Unified",
                 "level": 2
               }
             ]
           }
         ],
         "caches": [
           {
             "size": 16777216,
             "type": "Unified",
             "level": 3
           }
         ]
       },
       {
         "node_id": 1,
         "memory": 0,
         "hugepages": null,
         "cores": [
           {
             "core_id": 0,
             "thread_ids": [
               2,
               3
             ],
             "caches": [
               {
                 "size": 4194304,
                 "type": "Unified",
                 "level": 2
               }
             ]
           }
         ],
         "caches": [
           {
             "size": 16777216,
             "type": "Unified",
             "level": 3
           }
         ]
       },
     ... 
     ...
     ]
    
  3. On version 0.37.0 , topology generated by cadvisor and fed to kubelet looks like below.
    Below there seems to be a mix up of socket and numa nodes.

    "topology": [
       {
         "node_id": 0,
         "memory": 50628362240,
         "hugepages": [
           {
             "page_size": 1048576,
             "num_pages": 8
           }
         ],
         "cores": [
           {
             "core_id": 0,
             "thread_ids": [
               0,
               1,
               10,
               11,
               12,
               13,
               14,
               15,
               16,
               17,
               18,
               19,
               2,
               20,
               21,
               22,
               23,
               24,
               25,
               26,
               27,
               28,
               29,
               3,
               30,
               31,
               4,
               5,
               6,
               7,
               8,
               9
             ],
             "caches": null,
             "socket_id": 4
           }
         ],
         "caches": null
       }
     ]
    
  4. We suspect that following commit has introduced the bug.

     commit c5a9232a94846cdd9f2a80b3a969a7717e880e10
     Author: Katarzyna Kujawa <katarzyna.kujawa@intel.com>
     Date: Wed Mar 4 11:33:46 2020 +0100
    

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions