Incorrect CPU topology on Single NUMA and Multi socket system leads to performance degradation for POD #2798
Description
Issue and Impact seen:
Due to incorrect CPU topology generated by cadvisor (version 0.37), CNF Pod is allocated CPU 1 which is a hyperthread sibling of CPU 0 (CPU 0 hosts most of the OS and other components). This leads performance degradation of the Pod.
In earlier versions (k8s 1.18.9, cadvisor 0.35.0) , CPU allocation happened from CPU 2-14 , whereas in k8s version 1.19.3 , cadvisor version 0.37.0 , CPU allocation happens from CPU 1-13.
Issue detected version:
k8s version 1.19.3 , cadvisor version 0.37.0
Version where it works well:
k8s version 1.18.9, cadvisor version 0.35.0
Analysis done so far:
-
We see this issue on Singel NUMA , Multi Socket System
-
On version 0.35.0 , topology generated by cadvisor and fed to kubelet looks like below.
Here there is a clear mapping of node/socket to cores/threads."topology": [ { "node_id": 0, "memory": 50628362240, "hugepages": [ { "page_size": 1048576, "num_pages": 8 } ], "cores": [ { "core_id": 0, "thread_ids": [ 0, 1 ], "caches": [ { "size": 4194304, "type": "Unified", "level": 2 } ] } ], "caches": [ { "size": 16777216, "type": "Unified", "level": 3 } ] }, { "node_id": 1, "memory": 0, "hugepages": null, "cores": [ { "core_id": 0, "thread_ids": [ 2, 3 ], "caches": [ { "size": 4194304, "type": "Unified", "level": 2 } ] } ], "caches": [ { "size": 16777216, "type": "Unified", "level": 3 } ] }, ... ... ]
-
On version 0.37.0 , topology generated by cadvisor and fed to kubelet looks like below.
Below there seems to be a mix up of socket and numa nodes."topology": [ { "node_id": 0, "memory": 50628362240, "hugepages": [ { "page_size": 1048576, "num_pages": 8 } ], "cores": [ { "core_id": 0, "thread_ids": [ 0, 1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 3, 30, 31, 4, 5, 6, 7, 8, 9 ], "caches": null, "socket_id": 4 } ], "caches": null } ]
-
We suspect that following commit has introduced the bug.
commit c5a9232a94846cdd9f2a80b3a969a7717e880e10 Author: Katarzyna Kujawa <katarzyna.kujawa@intel.com> Date: Wed Mar 4 11:33:46 2020 +0100