In-Place Pod Vertical Scaling feature #92127

vinaykul · 2020-06-14T22:56:42Z

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind api-change
/kind bug
/kind cleanup
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
This PR implements API changes defined in the In-Place Pod Vertical Scaling KEP , the CRI-API changes defined in Kubelet CRI KEP , and the core implementation of In-Place Pod Vertical Scaling feature that allows users or entities such as VPA to request pod resize without restarting containers.

To summarize changes from the two KEPs referenced above, this PR brings the following:

Adds new fields named ResourcesAllocated and ResizePolicy to Pod's Container type, and Resources field to ContainerStatus type. It also adds a new admission controller that limits the ability to change ResourcesAllocated field to system-node account. This API change enables users to scale a Pod's CPU and memory resources up or down.
The CRI change enables runtimes to report currently applied resource limits for a container, and adds support for Windows containers.

The API change code had an initial review by @thockin , and the core implementation has been reviewed by @dashpole and revised. See: vinaykul#1

We are now at a point where we can use further review from the community.

Alpha Graduation Criteria Status

API change and core functionality implemented.
Resource quota, Limit ranger, scheduler changes, support for kubectl describe is in.
Resize policy has been implemented.
Unit tests for the core feature changes are in place.
E2E tests: Just started working on it.
Manually tested the at-least following E2E scenarios using kubectl patch:
- Single container (guaranteed, burstable) Pod, increase resources: CPU + Memory, CPU only, Memory only.
- Single container (guaranteed, burstable) Pod, decrease resources: CPU + Memory, CPU only, Memory only.
- Three containers (guaranteed, burstable) Pod, increase CPU of c1, decrease c2, no change for c3, decrease memory of c1, no change for c2 mem, increase memory of c3.
- Restart kubelet while pod resize is in progress.
- Inject random failure in dockershim response to UpdateContainerResources CRI API.
- Dockershim does not return Resources information in ContainerStatus API.
- Three containers (guaranteed) with RestartContainer resize policy for c2 memory - ensure that c2 is restarted only when c2 memory is resized, regardless of c2 cpu resize.

Which issue(s) this PR fixes:

Fixes #9043

Special notes for your reviewer:

The commit named 'InPlace Pod Vertical Scaling feature - API change' implements the API change for In-Place Pod Vertical Scaling.
The commit named 'InPlace Pod Vertical Scaling feature - Kubelet CRI change' implements the CRI changes to allow runtimes to report resource limits.
The commits named 'InPlace Pod Vertical Scaling feature - core implementation' contains the core kubelet changes that implements the feature, and changes to scheduler to use ResourcesAllocated field, and apiserver to support resource-quota, limit-ranger, and kubectl descibe outputs. Please look at the change to switch-over to ResourcesAllocated, and the resource-quota, limit-ranger changes closely - I have not had a lot of play-time with those.

Does this PR introduce a user-facing change?: Yes

1. A new field named ResourcesAllocated (type v1.ResourceList) is added to Container struct.
2. A new field named ResizePolicy is added to Container struct.
3. A new field named Resources (type v1.ResourceRequirements) is added to ContainerStatus struct.
4. A new admission controller named 'PodResourceAllocation' is introduced to limit access to modification of ResourcesAllocated field.

e.g: 
root@skibum:~/vInPlacePodVerticalScaling# cat ~/YML/1pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: 1pod
spec:
  containers:
  - name: stress
    image: skiibum/ubuntu-stress:18.10
    resources:
      limits:
        cpu: "1"
        memory: "1Gi"
      requests:
        cpu: "1"
        memory: "1Gi"
    resizePolicy:
    - resourceName: cpu
      policy: NoRestart
    - resourceName: memory
      policy: RestartContainer

The resources can be patched via API or via kubectl as illustrated below:

kubectl patch pod 1pod --patch '{"spec":{"containers":[{"name":"stress", "resources":{"requests":{"cpu":"1500m"}, "limits":{"cpu":"1500m"}}}]}}'

kubectl describe pod now displays resources allocated to the pod's containers as shown below:

root@skibum:~/vInPlacePodVerticalScaling# ./cluster/kubectl.sh describe pod 1pod
Name:         1pod
Namespace:    default
Priority:     0
Node:         127.0.0.1/127.0.0.1
Start Time:   Tue, 30 Jun 2020 14:58:02 -0700
Labels:       <none>
Annotations:  <none>
Status:       Running
IP:           172.18.0.3
IPs:
  IP:  172.18.0.3
Containers:
  stress:
    Container ID:   docker://481892996beb94ee0e0f90de3911f683419ea9d832f765ae762ce893af16feec
    Image:          skiibum/ubuntu-stress:18.10
    Image ID:       docker-pullable://skiibum/ubuntu-stress@sha256:161c50ff4ccc79744c29ea0f337652ce2854ce827914160c7043646b1f8988ba
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Tue, 30 Jun 2020 14:58:03 -0700
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1500m
      memory:  1Gi
    Requests:
      cpu:     1500m
      memory:  1Gi
    Allocations:
      cpu:        1500m
      memory:     1Gi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lvrwt (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-lvrwt:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-lvrwt
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From                Message
  ----    ------     ----  ----                -------
  Normal  Scheduled  73s   default-scheduler   Successfully assigned default/1pod to 127.0.0.1
  Normal  Pulled     73s   kubelet, 127.0.0.1  Container image "skiibum/ubuntu-stress:18.10" already present on machine
  Normal  Created    73s   kubelet, 127.0.0.1  Created container stress
  Normal  Started    73s   kubelet, 127.0.0.1  Started container stress
root@skibum:~/vInPlacePodVerticalScaling#

Additionally, kubectl describe node displays total node resource allocations and CPU and memory resource allocations for all pods on the node as below:

root@skibum:~/vInPlacePodVerticalScaling# ./cluster/kubectl.sh describe node 127.0.0.1
Name:               127.0.0.1
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=127.0.0.1
                    kubernetes.io/os=linux
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Tue, 30 Jun 2020 14:52:23 -0700
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  127.0.0.1
  AcquireTime:     <unset>
  RenewTime:       Tue, 30 Jun 2020 15:04:13 -0700
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Tue, 30 Jun 2020 15:02:35 -0700   Tue, 30 Jun 2020 14:52:23 -0700   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Tue, 30 Jun 2020 15:02:35 -0700   Tue, 30 Jun 2020 14:52:23 -0700   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Tue, 30 Jun 2020 15:02:35 -0700   Tue, 30 Jun 2020 14:52:23 -0700   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Tue, 30 Jun 2020 15:02:35 -0700   Tue, 30 Jun 2020 14:52:33 -0700   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  127.0.0.1
  Hostname:    127.0.0.1
Capacity:
  cpu:                16
  ephemeral-storage:  3807464952Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32935964Ki
  pods:               110
Allocatable:
  cpu:                16
  ephemeral-storage:  3508959693954
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32833564Ki
  pods:               110
System Info:
  Machine ID:                 91a8ee87a44815729b015c5a5af38563
  System UUID:                49434D53-0200-2500-9072-002590726C8A
  Boot ID:                    5f7a62a7-8d6d-49a5-950b-b89a5a7561c0
  Kernel Version:             4.4.0-184-generic
  OS Image:                   Ubuntu 16.04.6 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.8
  Kubelet Version:            v1.19.0-beta.2.529+426d0bbcb13489-dirty
  Kube-Proxy Version:         v1.19.0-beta.2.529+426d0bbcb13489-dirty
Non-terminated Pods:          (2 in total)
  Namespace                   Name                         CPU Requests  CPU Limits  CPU Allocations  Memory Requests  Memory Limits  Memory Allocations  AGE
  ---------                   ----                         ------------  ----------  ---------------  ---------------  -------------  ------------------  ---
  default                     1pod                         1500m (9%)    1500m (9%)  1500m (9%)       1Gi (3%)         1Gi (3%)       1Gi (3%)            6m13s
  kube-system                 kube-dns-5949897dd5-cjqdl    260m (1%)     0 (0%)      260m (1%)        110Mi (0%)       170Mi (0%)     110Mi (0%)          11m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits       Allocations
  --------           --------     ------       -----------
  cpu                1760m (11%)  1500m (9%)   1760m (11%)
  memory             1134Mi (3%)  1194Mi (3%)  1134Mi (3%)
  ephemeral-storage  0 (0%)       0 (0%)       0 (0%)
  hugepages-1Gi      0 (0%)       0 (0%)
  hugepages-2Mi      0 (0%)       0 (0%)
Events:
  Type    Reason                   Age                From                   Message
  ----    ------                   ----               ----                   -------
  Normal  Starting                 11m                kubelet, 127.0.0.1     Starting kubelet.
  Normal  NodeHasSufficientMemory  11m (x2 over 11m)  kubelet, 127.0.0.1     Node 127.0.0.1 status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    11m (x2 over 11m)  kubelet, 127.0.0.1     Node 127.0.0.1 status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     11m (x2 over 11m)  kubelet, 127.0.0.1     Node 127.0.0.1 status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced  11m                kubelet, 127.0.0.1     Updated Node Allocatable limit across pods
  Normal  Starting                 11m                kube-proxy, 127.0.0.1  Starting kube-proxy.
  Normal  NodeReady                11m                kubelet, 127.0.0.1     Node 127.0.0.1 status is now: NodeReady
root@skibum:~/vInPlacePodVerticalScaling#

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

In-Place Update of Pod Resources KEP: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/20181106-in-place-update-of-pod-resources.md

Container Resources CRI API Changes for Pod Vertical Scaling KEP:
https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/20191025-kubelet-container-resources-cri-api-changes.md

k8s-ci-robot · 2020-06-14T22:56:50Z

Welcome @vinaykul!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2020-06-14T22:56:51Z

Hi @vinaykul. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vinaykul · 2020-06-14T23:01:55Z

/assign @dashpole @thockin @liggitt @derekwaynecarr @dchen1107 @ahg-g @PatrickLang @yujuhong

neolit123 · 2020-06-30T17:54:46Z

/unassign

vinaykul · 2020-07-01T17:02:47Z

After discussions with @sig-cli in today's meeting, the kubectl describe changes in this PR are best deferred to when we get this feature into beta - they are a nice to have, not critical to the feature. I'll update the PR to back-out the kubectl changes along with moving pod_resize.go from test/e2e/common to test/e2e/node

thockin

This is a giant PR, so I only started with API things, and I apologize if some have been covered - hard to catch the whole historical state. I started re-reading the KEP and there are things there that are unclear to me - is it reasonable to send comments on that?

Unfortunately - commenting on a committed KEP is hard. I'll open a PR with <<[UNRESOLVED]>> blocks for discussion. Is that fair?

thockin · 2020-07-01T17:35:17Z

pkg/apis/core/types.go

@@ -1966,6 +1966,33 @@ const (
 	PullIfNotPresent PullPolicy = "IfNotPresent"
 )

+// ContainerResizePolicy specifies user guidance on how container resource resize should be handled.
+// Only one of the following container resize policies may be specified.
+// If none of the following policies is specified, it defaults to NoRestart.


The remarks about default behavior are out of place here, and should go on the field not the type.

Also, is that the safe default?

By safe, do you mean 'wont have any undesirable results'?

Assume 90% of users use the default. Are we going to behave correctly for most of them most of the time? It seems like "Restart" is the more conservative default, and "NoRestart" is more idealistic.

thockin · 2020-07-01T17:44:10Z

pkg/apis/core/types.go

+type ContainerResizePolicy string
+
+// These are the valid container resize policies:
+// NoRestart policy tells Kubelet to call UpdateContainerResources CRI API to resize


This needs to be clearer and comments should be grouped with the constants. E.g. here's my interpretation:

const ( // NoRestart indicates that this container does not require a restart // to have a specific resource (e.g. memory) resized. NoRestart ContainerResizePolicy = "NoRestart" // Restart indicates that this container requires a restart when a // specific resource (e.g. memory) is resized. For example, Java apps // which use the `-xmxN` flag would require to be restarted when the // memory resource is resized. Restart ContainerResizePolicy = "Restart" )

Note I renamed "RestartContainer" since "container" is redundant...

Do we need things like RestartOnGrow and RestartOnShrink ? Should this be RestartAlways?

These could be useful, especially RestartOnShrink since reclaiming memory may not be easily accomplished. @derekwaynecarr Should we add these now? I hope it is OK to updated a merged KEP after-the-fact.

We can always add them later, as long as we leave room for them. I don't like designing to hypotheticals, so maybe best to wait until we know we need them?

thockin · 2020-07-01T17:44:32Z

pkg/apis/core/types.go

+// ContainerResizePolicy specifies user guidance on how container resource resize should be handled.
+// Only one of the following container resize policies may be specified.
+// If none of the following policies is specified, it defaults to NoRestart.
+type ContainerResizePolicy string


Is this really "Policy" or "Requirement" ?

Naming wise, I feel like this should be ResourceResizeRequirement or ResourceResizePolicy, and the struct below should be ContainerResizePolicy

Thanks, I agree this looks cleaner. I'll update this to ResourceResizePolicy and ContainerResizePolicy, with values of Restart and NoRestart. RestartContainer got left that way because we had a third RestartPod option at one point which we dropped because we couldn't find a good documented use-case or reason to keep it in the IBM proposal (from where the resize policies came in the first place).

thockin · 2020-07-01T17:46:25Z

pkg/apis/core/types.go

+)
+
+// ResizePolicy represents the resource resize policy for a single container.
+type ResizePolicy struct {


ResourceList is one of the only places we use map in the API, but it fits well. There's a value to consistency - why is this better as a struct than a map?

@thockin I wanted this to be a map (KEP discussion) for simplicity (both for user and for code) despite the API conventions recommending lists of named subobjects preferred over maps .. cpu and memory are the limited key choices here vs container object where user can choose the name.

We discussed this during API live review in 2019 KubeCon San Diego, and we leaned towards staying with the API conventions as it makes future extensibility simple.

@liggitt Do you think we can revisit this? Can we make an exception here and use maps even though at this point we don't have any reason for the values to take parameters? I haven't really spent a lot of time seeing APIs evolve, but if you feel there are strong reasons for value to be objects instead of string (my preference because it is simple), we can do that. Are we looking at something like ContainerState?

If we make the "value" a struct, then a list probably makes more sense.

If we make the "value" a single value, I have a slight lean towards the map, for congruence with the resources field

Then we should leave it as a list - it is possible future iterations may want to qualify the behavior a bit more, such as number of failed update retry attempts before restarting.

thockin · 2020-07-01T17:48:13Z

pkg/apis/core/types.go

@@ -2053,6 +2080,12 @@ type Container struct {
 	// Compute resource requirements.
 	// +optional
 	Resources ResourceRequirements
+	// Node compute resources allocated to the container.
+	// +optional
+	ResourcesAllocated ResourceList


Why is this spec and not status ?

This field was added only so that we have a source-of-truth of actual node resource allocations to the containers and the pod, should the status field be erased or lost - API conventions need status to be reconstructible.

The status field now has a new Resources field (type ResourceRequirements) to reflect that actual config reported by the runtime via extended CRI

Wouldn't it be reconstructible from the Kubelet? Being in spec requires all the admission control mess which is a really string smell that something is off, IMO

Short answer - no.

For point-in-time, there is no way to discover memory request today - it is not stored anywhere. We can reconstruct current CPU requests, limits & memory limits from the extended ContainerStatus CRI API (not today) once runtime supports the CRI extension implemented in this PR.

We had discussed the alternative of checkpointing this locally on the node via atomic file-writes but Dawn mentioned they had run into issues with that kind of approach in Borg, and felt it was safest to have PodSpec be the single source-of-truth.

We had a long discussion on this in sig-node meetings before finally settling on adding ResourcesAllocated to PodSpec, and bringing it up during the Live API review last year. For gory details with kubelet restart scenarios, please see kubernetes/enhancements#686 (comment)

liggitt · 2020-07-01T18:03:20Z

plugin/pkg/auth/authorizer/rbac/bootstrappolicy/policy.go

+
+	// InPlacePodVerticalScaling
+	if utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodVerticalScaling) {
+		nodePolicyRules = append(nodePolicyRules, rbacv1helpers.NewRule("patch").Groups(legacyGroup).Resources("pods").RuleOrDie())


cc @tallclair on node update permission on pod specs

/assign tallclair

liggitt · 2020-07-01T18:04:57Z

plugin/pkg/admission/resourcequota/controller.go

+				if a.GetResource().GroupResource() == corev1.Resource("pods") {
+					deltaUsage = quota.Subtract(deltaUsage, prevUsage)
+				}


this leaves non-pod resources with deltaUsage uncomputed... need a test case that would have caught this

remind me what protects against making requests that 1) reduce pod resource usage and 2) are constructed to fail to persist (e.g. a conflicting resourceVersion) and artificially lowering consumed quota?

Subtract - 🤦‍♂️ .. will fix and add test.

Regarding gaming of resource quota, we decided against using max(Resources.Requests, ResourcesAllocated) as we don't realistically expect a downward resize to stay unenforced by the kubelet for an extended period of time. Kubelet will always accept downward resize typically in the watch latency.

Please review the rationale noted on the KEP discussion for requests:

resource quota without max()

kubelet handling resize without max()

Similarly for limits, iirc we expected CRI to converge to the new desired limits shortly after watch latency.

liggitt · 2020-07-01T18:09:54Z

plugin/pkg/admission/podresourceallocation/admission.go

+		if !ok {
+			return apierrors.NewBadRequest("Resource was marked with kind Pod but was unable to be converted")
+		}
+		if len(pod.Spec.Containers) != len(oldPod.Spec.Containers) {


slight preference for only doing the resource preservation when the length is the same, and letting a length change continue on and get the normal validation error about not changing pod content

if len(pod.Spec.Containers) == len(oldPod.Spec.Containers) { ... preserve/populate values }

liggitt · 2020-07-01T18:10:12Z

plugin/pkg/admission/podresourceallocation/admission.go

+		if !ok {
+			return apierrors.NewBadRequest("Resource was marked with kind Pod but was unable to be converted")
+		}
+		if len(pod.Spec.Containers) != len(oldPod.Spec.Containers) {


what about initcontainers and ephemeralcontainers?

This feature is not supported for init-containers, we called it out as a non-goal. At that time, I was unaware of ephemeral containers so it is not covered in the KEP. But documentation on it suggested setting resources aren't allowed for ECs, so I didn't worry about it here. I haven't tested it, but I think normal validation will error out if user tries to set EC resources.

liggitt · 2020-07-01T18:13:42Z

plugin/pkg/admission/podresourceallocation/admission.go

+	if !utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodVerticalScaling) {
+		return nil
+	}


prefer injecting the feature gates over taking a direct dependency on the global feature gate. as an example, see:

kubernetes/plugin/pkg/admission/runtimeclass/admission.go

Lines 82 to 87 in c53ce48

// InspectFeatureGates allows setting bools without taking a dep on a global variable

func (r *RuntimeClass) InspectFeatureGates(featureGates featuregate.FeatureGate) {

r.runtimeClassEnabled = featureGates.Enabled(features.RuntimeClass)

r.podOverheadEnabled = featureGates.Enabled(features.PodOverhead)

r.inspectedFeatures = true

}

kubernetes/plugin/pkg/admission/runtimeclass/admission.go

Lines 99 to 106 in c53ce48

// ValidateInitialization implements the WantsExternalKubeInformerFactory interface.

func (r *RuntimeClass) ValidateInitialization() error {

if !r.inspectedFeatures {

return fmt.Errorf("InspectFeatureGates was not called")

}

if !r.runtimeClassEnabled {

return nil

}

kubernetes/plugin/pkg/admission/runtimeclass/admission.go

Lines 116 to 120 in c53ce48

// Admit makes an admission decision based on the request attributes

func (r *RuntimeClass) Admit(ctx context.Context, attributes admission.Attributes, o admission.ObjectInterfaces) error {

if !r.runtimeClassEnabled {

return nil

}

liggitt · 2020-07-01T18:15:02Z

pkg/api/pod/util.go

+	if !utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodVerticalScaling) && !inPlacePodVerticalScalingInUse(oldPodSpec) {
+		// Drop ResourcesAllocated and ResizePolicy fields. Don't drop updates to Resources field because
+		// template spec Resources field is mutable for certain controllers. Let ValidatePodUpdate handle it.
+		for i := range podSpec.Containers {


also initContainers/ephemeralContainers? should this use the VisitContainers helper?

Let me try out init and ephemeral containers and be sure about them. I think initContainers would need this.

liggitt · 2020-07-01T18:15:45Z

pkg/api/pod/util.go

+	}
+	var inUse bool
+	VisitContainers(podSpec, AllContainers, func(c *api.Container, containerType ContainerType) bool {
+		if (c.ResourcesAllocated != nil && len(c.ResourcesAllocated) > 0) || len(c.ResizePolicy) > 0 {


can simplify to

Suggested change

if (c.ResourcesAllocated != nil && len(c.ResourcesAllocated) > 0) || len(c.ResizePolicy) > 0 {

if len(c.ResourcesAllocated) > 0 || len(c.ResizePolicy) > 0 {

liggitt · 2020-07-01T18:18:39Z

pkg/api/v1/resource/helpers.go

+	case "resourcesAllocated.cpu":
+		return convertResourceCPUToString(container.ResourcesAllocated.Cpu(), divisor)
+	case "resourcesAllocated.memory":
+		return convertResourceMemoryToString(container.ResourcesAllocated.Memory(), divisor)
+	case "resourcesAllocated.ephemeral-storage":
+		return convertResourceEphemeralStorageToString(container.ResourcesAllocated.StorageEphemeral(), divisor)


this extends allowed values for the downward API in ways old kubelets won't understand, right?

It looks like older kubelet should fail validation if users try to expose these fields in their containers (a check I need to update to allow these fields).

End result is that pod won't create, so we won't get to the point where kubelet gets bad info - atleast for 1.18 this is true, I didn't try 1.17, but same validation code exists there. Is this an issue? This seems like the right behavior.

Are you concerned about the case where apiserver has been upgraded, but kubelet has yet to be?

In that event, the old kubelet will not recognize these new fields that apiserver lets through - and it results in container start error. So not great UX - "sometimes pod starts, and sometimes it won't" - depending on whether pod lands on and updated kubelet or not, until upgrade process complete.

Is there a way we can handle this?

thockin · 2020-07-01T18:48:47Z

@vinaykul PTAL at kubernetes/enhancements#1883

vinaykul · 2020-07-01T21:58:56Z

/assign @tallclair

vinaykul · 2020-07-02T17:24:00Z

Notes from today's SIG-scheduling weekly meeting discussion on this feature:

Scheduler change looks OK, but consider expanding the container.ResourcesAllocated change to InitContainers as well to keep everything uniform.
- admission plugin should set ResourcesAllocated for init containers as well in this case.
- validation should reject attempt to mutate init container.
Verify if this works for upgrade

/cc @ahg-g

pkg/scheduler/framework/v1alpha1/types.go

k8s-ci-robot · 2020-07-07T17:11:14Z

@vinaykul: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vinaykul · 2020-09-24T17:28:02Z

I'm closing this PR and will start a new one.

For posterity, we decided that previously approved API changes are not what we really wanted, and we are revisiting the KEP as per discussion in PR kubernetes/enhancements#1883

I'm in the process of refactoring this PR with API changes as per the above discussion. I'll close this PR and start a new one.

@alejandrox1 I'll create a separate issue and a PR that fixes LookForStringInPodExec instead of adding LookForStringInPodExecToContainer as discussed in #92127 (comment) . The usage of LookForStringInPodExec is still very low this is a zero functionality change, so it should be a breeze to review.

The e2e tests that consume this will come with the main PR for this feature.

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 14, 2020

k8s-ci-robot requested review from derekwaynecarr and yue9944882 June 14, 2020 22:58

k8s-ci-robot assigned ahg-g, dashpole, dchen1107, derekwaynecarr and liggitt Jun 14, 2020

k8s-ci-robot unassigned neolit123 Jun 30, 2020

thockin reviewed Jul 1, 2020

View reviewed changes

liggitt reviewed Jul 1, 2020

View reviewed changes

k8s-ci-robot assigned tallclair Jul 1, 2020

vinaykul mentioned this pull request Jul 2, 2020

API modifications for in-place pod resize KEP kubernetes/enhancements#1883

Merged

k8s-ci-robot requested review from ahg-g and liggitt July 2, 2020 17:24

ahg-g reviewed Jul 3, 2020

View reviewed changes

pkg/scheduler/framework/v1alpha1/types.go Show resolved Hide resolved

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 7, 2020

mikechengwei mentioned this pull request Sep 18, 2020

Add pod in-place feature. pingcap/tidb-operator#3287

Open

vinaykul mentioned this pull request Sep 24, 2020

In-place rolling updates #9043

Closed

vinaykul closed this Sep 24, 2020

vinaykul mentioned this pull request Sep 25, 2020

Add LookForStringInPodExecToContainer(...) that takes container name as argument. #95055

Merged

vinaykul mentioned this pull request May 7, 2021

Request PRR approval for two KEPs: "In-place Pod Vertical Scaling" and "Kubelet CRI extension for In-Place Pod Vertical Scaling" kubernetes/enhancements#2474

Merged

vinaykul mentioned this pull request Jun 22, 2021

In-place Pod Vertical Scaling feature #102884

Merged

vinaykul mentioned this pull request Mar 22, 2023

Fix pod object update that may cause data race #116702

Merged

	// InspectFeatureGates allows setting bools without taking a dep on a global variable
	func (r *RuntimeClass) InspectFeatureGates(featureGates featuregate.FeatureGate) {
	r.runtimeClassEnabled = featureGates.Enabled(features.RuntimeClass)
	r.podOverheadEnabled = featureGates.Enabled(features.PodOverhead)
	r.inspectedFeatures = true
	}

	// ValidateInitialization implements the WantsExternalKubeInformerFactory interface.
	func (r *RuntimeClass) ValidateInitialization() error {
	if !r.inspectedFeatures {
	return fmt.Errorf("InspectFeatureGates was not called")
	}
	if !r.runtimeClassEnabled {
	return nil
	}

	// Admit makes an admission decision based on the request attributes
	func (r *RuntimeClass) Admit(ctx context.Context, attributes admission.Attributes, o admission.ObjectInterfaces) error {
	if !r.runtimeClassEnabled {
	return nil
	}

	if (c.ResourcesAllocated != nil && len(c.ResourcesAllocated) > 0) \|\| len(c.ResizePolicy) > 0 {
	if len(c.ResourcesAllocated) > 0 \|\| len(c.ResizePolicy) > 0 {

In-Place Pod Vertical Scaling feature #92127

In-Place Pod Vertical Scaling feature #92127

Conversation

vinaykul commented Jun 14, 2020 • edited Loading

k8s-ci-robot commented Jun 14, 2020

k8s-ci-robot commented Jun 14, 2020

vinaykul commented Jun 14, 2020

neolit123 commented Jun 30, 2020

vinaykul commented Jul 1, 2020

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin Jul 1, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vinaykul Jul 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vinaykul Jul 7, 2020 • edited Loading

Choose a reason for hiding this comment

thockin commented Jul 1, 2020 • edited Loading

vinaykul commented Jul 1, 2020 • edited Loading

vinaykul commented Jul 2, 2020 • edited Loading

k8s-ci-robot commented Jul 7, 2020

vinaykul commented Sep 24, 2020

vinaykul commented Jun 14, 2020 •

edited

Loading

thockin Jul 1, 2020 •

edited

Loading

vinaykul Jul 2, 2020 •

edited

Loading

vinaykul Jul 7, 2020 •

edited

Loading

thockin commented Jul 1, 2020 •

edited

Loading

vinaykul commented Jul 1, 2020 •

edited

Loading

vinaykul commented Jul 2, 2020 •

edited

Loading