Skip to content

Proposal: IP addresses as a opaque counted resource #5507

Closed as not planned
Closed as not planned
@gmarek

Description

IP addresses as a opaque counted resource

Currently Kubernetes scheduler is not aware of number of IP addresses available on a single node. We do have some assumptions about network masks, from which a number of available IPs can be inferred, but this is not general or configurable in any way. We want to create a simple counted resource published by kubelet and visible for scheduler, which will allow finer grained configurations to be used.

Resources in Kubernetes

Resource validation in Kubernetes is performed as one of FitPredicates of Scheduler, namely ResourceFit predicate check. Available resources are computed by reading resources from NodeInfo, and subtracting resources of Pods currently bound to the given Node. The data comes (indirectly?) from etcd in a form of API Node > Spec > Capacity struct.

Capacity is a part of NodeStatus periodically updated by Kubelet in tryUpdateNodeStatus method (even though Capacity is a part of the NodeSpec). As of now, all Node capacity data is read from cAdvisor MachineInfo struct. During Node creation NodeController fills up Capacity in some way (either statically or get it from the cloud provider), but later on it’s overwritten by data provided by cAdvisor. IUUC Kubelet is the only entity which triggers modifications of Node entries after they are created.

IPs as resources

Introducing IP addresses as resources is straightforward: we need to add them to ResourceList/ResourceName and related places, and modify PodFitResources method in Scheduler. In current model it is assumed that we have one IP per Pod model, hence there’s no need of introducing additional scheduling requirements/usage info. In the future if we’d want to loosen this assumption changes to make will be pretty straightforward.

On top of those simple changes we need to decide how to keep the configuration. As it resides in NodeSpec, which is written by both Kubelet and NodeController it can be a part of any one of configurations (Node or Kubelet). If it were a part of Kubelet, then we’d need to wait for first “syncNodeStatus” call before setting it up. On the up side doing it in Kubelet will make the change of the IP number in runtime more robust (Kubelet is and will be overwriting Specs, so if anything else were to modify the value race condition would be possible).

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.priority/backlogHigher priority than priority/awaiting-more-evidence.sig/networkCategorizes an issue or PR as relevant to SIG Network.sig/schedulingCategorizes an issue or PR as relevant to SIG Scheduling.

    Type

    No type

    Projects

    • Status

      Closed

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions