Proposal: IP addresses as a opaque counted resource #5507
Description
IP addresses as a opaque counted resource
Currently Kubernetes scheduler is not aware of number of IP addresses available on a single node. We do have some assumptions about network masks, from which a number of available IPs can be inferred, but this is not general or configurable in any way. We want to create a simple counted resource published by kubelet and visible for scheduler, which will allow finer grained configurations to be used.
Resources in Kubernetes
Resource validation in Kubernetes is performed as one of FitPredicates of Scheduler, namely ResourceFit predicate check. Available resources are computed by reading resources from NodeInfo, and subtracting resources of Pods currently bound to the given Node. The data comes (indirectly?) from etcd in a form of API Node > Spec > Capacity struct.
Capacity is a part of NodeStatus periodically updated by Kubelet in tryUpdateNodeStatus method (even though Capacity is a part of the NodeSpec). As of now, all Node capacity data is read from cAdvisor MachineInfo struct. During Node creation NodeController fills up Capacity in some way (either statically or get it from the cloud provider), but later on it’s overwritten by data provided by cAdvisor. IUUC Kubelet is the only entity which triggers modifications of Node entries after they are created.
IPs as resources
Introducing IP addresses as resources is straightforward: we need to add them to ResourceList/ResourceName and related places, and modify PodFitResources method in Scheduler. In current model it is assumed that we have one IP per Pod model, hence there’s no need of introducing additional scheduling requirements/usage info. In the future if we’d want to loosen this assumption changes to make will be pretty straightforward.
On top of those simple changes we need to decide how to keep the configuration. As it resides in NodeSpec, which is written by both Kubelet and NodeController it can be a part of any one of configurations (Node or Kubelet). If it were a part of Kubelet, then we’d need to wait for first “syncNodeStatus” call before setting it up. On the up side doing it in Kubelet will make the change of the IP number in runtime more robust (Kubelet is and will be overwriting Specs, so if anything else were to modify the value race condition would be possible).
Metadata
Assignees
Labels
Type
Projects
Status
Closed