testing: Fix pod delete timeout failures after InPlacePodVerticalScaling Graduate to Beta commit #128880

esotsal · 2024-11-20T11:45:31Z

What type of PR is this?

/kind bug
/kind failing-test

What this PR does / why we need it:

Please check analysis here and conversation in this PR below.

In brief this commit modifies testing code, and ensure same delete timeout duration is used, more specificaly

replaced DeletePod timeouts used before with framework.timeouts.PodDelete, previous timeouts where less than 5 minutes, please see full commit for details.

kubernetes/test/e2e/framework/timeouts.go

Line 26 in 35d098a

PodDelete: 5 * time.Minute,
Replace of Delete with DeleteSync pod_resize test, with a 5 minute timeout, to ensure Pods are Deleted and will not impact some tests which were not resilient with the existence of pods.

Which issues this PR fixes

Relates to #128837
Fixes test failures due to pods not deleted with the introduction of InPlacePodVerticalScaling Beta

Special notes for your reviewer:

Please see files changes for more details.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

esotsal · 2024-11-20T12:07:03Z

/cc @tallclair @hshiina @AnishShah

k8s-ci-robot · 2024-11-20T12:07:08Z

@esotsal: GitHub didn't allow me to request PR reviews from the following users: hshiina.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @tallclair @hshiina @AnishShah

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

esotsal · 2024-11-20T12:11:09Z

/triage accepted

kannon92 · 2024-11-20T12:11:56Z

/test pull-kubernetes-e2e-capz-windows-master

kannon92 · 2024-11-20T12:12:03Z

/test

k8s-ci-robot · 2024-11-20T12:12:06Z

@kannon92: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

/test pull-cos-containerd-e2e-ubuntu-gce
/test pull-kubernetes-conformance-kind-ga-only-parallel
/test pull-kubernetes-coverage-unit
/test pull-kubernetes-dependencies
/test pull-kubernetes-dependencies-go-canary
/test pull-kubernetes-e2e-gce
/test pull-kubernetes-e2e-gce-100-performance
/test pull-kubernetes-e2e-gce-cos
/test pull-kubernetes-e2e-gce-cos-canary
/test pull-kubernetes-e2e-gce-cos-no-stage
/test pull-kubernetes-e2e-gce-network-proxy-http-connect
/test pull-kubernetes-e2e-gce-pull-through-cache
/test pull-kubernetes-e2e-kind
/test pull-kubernetes-e2e-kind-ipv6
/test pull-kubernetes-integration
/test pull-kubernetes-integration-go-canary
/test pull-kubernetes-kubemark-e2e-gce-scale
/test pull-kubernetes-node-e2e-containerd
/test pull-kubernetes-typecheck
/test pull-kubernetes-unit
/test pull-kubernetes-unit-go-canary
/test pull-kubernetes-update
/test pull-kubernetes-verify
/test pull-kubernetes-verify-go-canary
/test pull-kubernetes-verify-lint

The following commands are available to trigger optional jobs:

/test check-dependency-stats
/test pull-crio-cgroupv1-node-e2e-eviction
/test pull-crio-cgroupv1-node-e2e-eviction-kubetest2
/test pull-crio-cgroupv1-node-e2e-features
/test pull-crio-cgroupv1-node-e2e-features-kubetest2
/test pull-crio-cgroupv1-node-e2e-hugepages
/test pull-crio-cgroupv1-node-e2e-hugepages-kubetest2
/test pull-crio-cgroupv1-node-e2e-resource-managers
/test pull-crio-cgroupv1-node-e2e-resource-managers-kubetest2
/test pull-crio-cgroupv2-imagefs-separatedisktest
/test pull-crio-cgroupv2-imagefs-separatedisktest-kubetest2
/test pull-crio-cgroupv2-node-e2e-eviction
/test pull-crio-cgroupv2-node-e2e-eviction-kubetest2
/test pull-crio-cgroupv2-node-e2e-hugepages
/test pull-crio-cgroupv2-node-e2e-hugepages-kubetest2
/test pull-crio-cgroupv2-node-e2e-resource-managers
/test pull-crio-cgroupv2-node-e2e-resource-managers-kubetest2
/test pull-crio-cgroupv2-splitfs-separate-disk
/test pull-crio-cgroupv2-splitfs-separate-disk-kubetest2
/test pull-e2e-gce-cloud-provider-disabled
/test pull-e2e-gci-gce-alpha-enabled-default
/test pull-kubernetes-apidiff
/test pull-kubernetes-conformance-image-test
/test pull-kubernetes-conformance-kind-ga-only
/test pull-kubernetes-conformance-kind-ipv6-parallel
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-crio-node-memoryqos-cgrpv2
/test pull-kubernetes-crio-node-memoryqos-cgrpv2-kubetest2
/test pull-kubernetes-cross
/test pull-kubernetes-e2e-autoscaling-hpa-cm
/test pull-kubernetes-e2e-autoscaling-hpa-cpu
/test pull-kubernetes-e2e-capz-azure-disk
/test pull-kubernetes-e2e-capz-azure-disk-vmss
/test pull-kubernetes-e2e-capz-azure-file
/test pull-kubernetes-e2e-capz-azure-file-vmss
/test pull-kubernetes-e2e-capz-conformance
/test pull-kubernetes-e2e-capz-master-windows-nodelogquery
/test pull-kubernetes-e2e-capz-windows-alpha-feature-vpa
/test pull-kubernetes-e2e-capz-windows-alpha-features
/test pull-kubernetes-e2e-capz-windows-master
/test pull-kubernetes-e2e-capz-windows-serial-slow
/test pull-kubernetes-e2e-capz-windows-serial-slow-hpa
/test pull-kubernetes-e2e-containerd-gce
/test pull-kubernetes-e2e-ec2
/test pull-kubernetes-e2e-ec2-arm64
/test pull-kubernetes-e2e-ec2-conformance
/test pull-kubernetes-e2e-ec2-conformance-arm64
/test pull-kubernetes-e2e-ec2-device-plugin-gpu
/test pull-kubernetes-e2e-gce-canary
/test pull-kubernetes-e2e-gce-correctness
/test pull-kubernetes-e2e-gce-cos-alpha-features
/test pull-kubernetes-e2e-gce-csi-serial
/test pull-kubernetes-e2e-gce-device-plugin-gpu
/test pull-kubernetes-e2e-gce-disruptive-canary
/test pull-kubernetes-e2e-gce-kubelet-credential-provider
/test pull-kubernetes-e2e-gce-network-policies
/test pull-kubernetes-e2e-gce-network-proxy-grpc
/test pull-kubernetes-e2e-gce-serial
/test pull-kubernetes-e2e-gce-serial-canary
/test pull-kubernetes-e2e-gce-storage-disruptive
/test pull-kubernetes-e2e-gce-storage-selinux
/test pull-kubernetes-e2e-gce-storage-slow
/test pull-kubernetes-e2e-gce-storage-snapshot
/test pull-kubernetes-e2e-gci-gce-autoscaling
/test pull-kubernetes-e2e-gci-gce-ingress
/test pull-kubernetes-e2e-gci-gce-ipvs
/test pull-kubernetes-e2e-gci-gce-nftables
/test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features
/test pull-kubernetes-e2e-kind-beta-features
/test pull-kubernetes-e2e-kind-canary
/test pull-kubernetes-e2e-kind-cloud-provider-loadbalancer
/test pull-kubernetes-e2e-kind-dual-canary
/test pull-kubernetes-e2e-kind-evented-pleg
/test pull-kubernetes-e2e-kind-ipv6-canary
/test pull-kubernetes-e2e-kind-ipvs
/test pull-kubernetes-e2e-kind-kms
/test pull-kubernetes-e2e-kind-multizone
/test pull-kubernetes-e2e-kind-nftables
/test pull-kubernetes-e2e-relaxed-environment-variable-validation
/test pull-kubernetes-e2e-storage-kind-alpha-beta-features
/test pull-kubernetes-e2e-storage-kind-disruptive
/test pull-kubernetes-e2e-storage-kind-volume-group-snapshots
/test pull-kubernetes-kind-dra
/test pull-kubernetes-kind-dra-all
/test pull-kubernetes-kind-json-logging
/test pull-kubernetes-kind-text-logging
/test pull-kubernetes-kubemark-e2e-gce-big
/test pull-kubernetes-linter-hints
/test pull-kubernetes-local-e2e
/test pull-kubernetes-node-arm64-e2e-containerd-ec2
/test pull-kubernetes-node-arm64-e2e-containerd-serial-ec2
/test pull-kubernetes-node-arm64-ubuntu-serial-gce
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2
/test pull-kubernetes-node-crio-cgrpv2-e2e
/test pull-kubernetes-node-crio-cgrpv2-e2e-kubetest2
/test pull-kubernetes-node-crio-cgrpv2-imagefs-e2e
/test pull-kubernetes-node-crio-cgrpv2-imagefs-e2e-kubetest2
/test pull-kubernetes-node-crio-cgrpv2-imagevolume-e2e
/test pull-kubernetes-node-crio-cgrpv2-imagevolume-e2e-kubetest2
/test pull-kubernetes-node-crio-cgrpv2-splitfs-e2e
/test pull-kubernetes-node-crio-cgrpv2-splitfs-e2e-kubetest2
/test pull-kubernetes-node-crio-cgrpv2-userns-e2e-serial
/test pull-kubernetes-node-crio-cgrpv2-userns-e2e-serial-kubetest2
/test pull-kubernetes-node-crio-e2e
/test pull-kubernetes-node-crio-e2e-kubetest2
/test pull-kubernetes-node-e2e-alpha-ec2
/test pull-kubernetes-node-e2e-containerd-1-7-dra
/test pull-kubernetes-node-e2e-containerd-alpha-features
/test pull-kubernetes-node-e2e-containerd-ec2
/test pull-kubernetes-node-e2e-containerd-features
/test pull-kubernetes-node-e2e-containerd-features-kubetest2
/test pull-kubernetes-node-e2e-containerd-kubetest2
/test pull-kubernetes-node-e2e-containerd-serial-ec2
/test pull-kubernetes-node-e2e-containerd-serial-ec2-eks
/test pull-kubernetes-node-e2e-containerd-standalone-mode
/test pull-kubernetes-node-e2e-containerd-standalone-mode-all-alpha
/test pull-kubernetes-node-e2e-cri-proxy-serial
/test pull-kubernetes-node-e2e-crio-cgrpv1-dra
/test pull-kubernetes-node-e2e-crio-cgrpv1-dra-kubetest2
/test pull-kubernetes-node-e2e-crio-cgrpv2-dra
/test pull-kubernetes-node-e2e-crio-cgrpv2-dra-kubetest2
/test pull-kubernetes-node-e2e-resource-health-status
/test pull-kubernetes-node-kubelet-containerd-flaky
/test pull-kubernetes-node-kubelet-credential-provider
/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
/test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
/test pull-kubernetes-node-kubelet-serial-cpu-manager
/test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv1-kubetest2
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv2-kubetest2
/test pull-kubernetes-node-kubelet-serial-hugepages
/test pull-kubernetes-node-kubelet-serial-memory-manager
/test pull-kubernetes-node-kubelet-serial-podresize
/test pull-kubernetes-node-kubelet-serial-podresources
/test pull-kubernetes-node-kubelet-serial-topology-manager
/test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2
/test pull-kubernetes-node-swap-conformance-fedora-serial
/test pull-kubernetes-node-swap-conformance-ubuntu-serial
/test pull-kubernetes-node-swap-fedora
/test pull-kubernetes-node-swap-fedora-serial
/test pull-kubernetes-node-swap-ubuntu-serial
/test pull-kubernetes-scheduler-perf
/test pull-kubernetes-unit-experimental
/test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

pull-kubernetes-conformance-kind-ga-only-parallel
pull-kubernetes-conformance-kind-ipv6-parallel
pull-kubernetes-dependencies
pull-kubernetes-e2e-ec2
pull-kubernetes-e2e-ec2-conformance
pull-kubernetes-e2e-gce
pull-kubernetes-e2e-gce-canary
pull-kubernetes-e2e-kind
pull-kubernetes-e2e-kind-ipv6
pull-kubernetes-integration
pull-kubernetes-linter-hints
pull-kubernetes-node-e2e-containerd
pull-kubernetes-typecheck
pull-kubernetes-unit
pull-kubernetes-verify
pull-kubernetes-verify-lint

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kannon92 · 2024-11-20T12:14:22Z

/test pull-kubernetes-node-kubelet-serial-containerd

/hold

Let’s confirm that this fixes the ci issues before merge.

dims · 2024-11-20T13:13:18Z

/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
/test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers

aojea · 2024-11-20T13:16:55Z

test/e2e/common/node/pod_resize.go

@@ -1006,7 +1008,7 @@ func doPodResizeTests(f *framework.Framework) {
 			}

 			ginkgo.By("deleting pod")
-			framework.ExpectNoError(podClient.Delete(ctx, newPod.Name, metav1.DeleteOptions{}))
+			podClient.DeleteSync(ctx, newPod.Name, metav1.DeleteOptions{}, timeouts.PodDelete)


why is the Delete call erroring now?
why should we ignore the error?

I agree using DeleteSync is better and very likely to fix or at least significantly improve flakiness/failures we observed lately.
But I also agree we should not ignore the error or at very least clearly documenting why.

We dont ignore the error , DeleteSync handles it internally or ?

kubernetes/test/e2e/framework/pod/pod_client.go

Lines 217 to 225 in c9092f6

// DeleteSync deletes the pod and wait for the pod to disappear for `timeout`. If the pod doesn't

// disappear before the timeout, it will fail the test.

func (c *PodClient) DeleteSync(ctx context.Context, name string, options metav1.DeleteOptions, timeout time.Duration) {

err := c.Delete(ctx, name, options)

if err != nil && !apierrors.IsNotFound(err) {

framework.Failf("Failed to delete pod %q: %v", name, err)

}

framework.ExpectNoError(WaitForPodNotFoundInNamespace(ctx, c.f.ClientSet, name, c.namespace, timeout), "wait for pod %q to disappear", name)

}

my bad, pod client.DeleteSync checks for the error internally

kubernetes/test/e2e/framework/pod/pod_client.go

Lines 219 to 225 in c9092f6

func (c *PodClient) DeleteSync(ctx context.Context, name string, options metav1.DeleteOptions, timeout time.Duration) {

err := c.Delete(ctx, name, options)

if err != nil && !apierrors.IsNotFound(err) {

framework.Failf("Failed to delete pod %q: %v", name, err)

}

framework.ExpectNoError(WaitForPodNotFoundInNamespace(ctx, c.f.ClientSet, name, c.namespace, timeout), "wait for pod %q to disappear", name)

}

/lgtm

got it thanks for the extra explanation @esotsal

Could we replace the command in a pod for resize tests with InfiniteSleepCommand or insert trap for handling SIGTERM? I think this blocks pod deletion for 30 seconds:

kubernetes/test/e2e/framework/pod/resize.go

Line 161 in c9092f6

cmd := "grep Cpus_allowed_list /proc/self/status | cut -f2 && sleep 1d"

If the pods terminates faster that would help a lot, and we can and should apply the same fix in quite other places, but still we will need to have guaranteed cleanup between tests in the lane marked serial (or to rethink from scratch a bunch of resource management tests)

@tallclair i think stability should be more important to test duration. Yes the flakes are due to undeleted pods, please check here

Does this mean that we do not have to revert the promotion #128875?

We still need to fix #128917 (although it's an easy fix), and I don't think the windows test failures are due to undeleted pods (though I could be wrong).

Could we replace the command in a pod for resize tests with InfiniteSleepCommand or insert trap for handling SIGTERM? I think this blocks pod deletion for 30 seconds:

kubernetes/test/e2e/framework/pod/resize.go

Line 161 in c9092f6

cmd := "grep Cpus_allowed_list /proc/self/status | cut -f2 && sleep 1d"

Missed this one , sorry @hshiina . acknowledged, will give it a try as well.

dims · 2024-11-22T13:33:50Z

pkg/kubelet/kuberuntime/kuberuntime_manager.go

@@ -549,6 +550,9 @@ func containerSucceeded(c *v1.Container, podStatus *kubecontainer.PodStatus) boo
 }

 func IsInPlacePodVerticalScalingAllowed(pod *v1.Pod) bool {
+	if runtime.GOOS == "windows" {


This looks darn important!!!

And ideally, this shouldn't be a runtime check. Isn't there something to mark a test as "does not work on Windows"?

We may move this function IsInPlacePodVerticalScalingAllowed to pkg/kubelet/kuberuntime/kuberuntime_manager_linux.go and (pkg/kubelet/kuberuntime/kuberuntime_manager_windows.go or kuberuntime_manager_others.go).

Add "[LinuxOnly]" to the e2e test?

Sounds worth trying, @pacoxu , @pohly @dims moving discussion here

dims · 2024-11-22T13:35:50Z

test/e2e/common/node/pod_resize.go

@@ -1006,7 +1006,7 @@ func doPodResizeTests(f *framework.Framework) {
 			}

 			ginkgo.By("deleting pod")
-			framework.ExpectNoError(podClient.Delete(ctx, newPod.Name, metav1.DeleteOptions{}))
+			podClient.DeleteSync(ctx, newPod.Name, metav1.DeleteOptions{}, f.Timeouts.PodDelete)


we are ok to skip checking the error here?

We dont ignore the error , DeleteSync handles it internally.

kubernetes/test/e2e/framework/pod/pod_client.go

Lines 217 to 225 in c9092f6

// DeleteSync deletes the pod and wait for the pod to disappear for `timeout`. If the pod doesn't

// disappear before the timeout, it will fail the test.

func (c *PodClient) DeleteSync(ctx context.Context, name string, options metav1.DeleteOptions, timeout time.Duration) {

err := c.Delete(ctx, name, options)

if err != nil && !apierrors.IsNotFound(err) {

framework.Failf("Failed to delete pod %q: %v", name, err)

}

framework.ExpectNoError(WaitForPodNotFoundInNamespace(ctx, c.f.ClientSet, name, c.namespace, timeout), "wait for pod %q to disappear", name)

}

Ack. I hadn't noticed the switch from Delete to DeleteSync.

Nice thanks @esotsal

FWIW, DeleteSync is not aligned with our E2E framework guidelines (should return an error), but it is what it is.

dims · 2024-11-22T13:36:47Z

test/e2e/common/node/lifecycle_hook.go

@@ -115,7 +115,7 @@ var _ = SIGDescribe("Container Lifecycle Hook", func() {
 				}, postStartWaitTimeout, podCheckInterval).Should(gomega.BeNil())
 			}
 			ginkgo.By("delete the pod with lifecycle hook")
-			podClient.DeleteSync(ctx, podWithHook.Name, *metav1.NewDeleteOptions(15), e2epod.DefaultPodDeletionTimeout)
+			podClient.DeleteSync(ctx, podWithHook.Name, *metav1.NewDeleteOptions(15), f.Timeouts.PodDelete)


Can you please call out what is the difference between the two timeouts and highlight it in the PR text?

I am guessing the new one is longer than the old one?

Yes,

kubernetes/test/e2e/framework/pod/pod_client.go

Line 47 in 35d098a

DefaultPodDeletionTimeout = 3 * time.Minute

kubernetes/test/e2e/framework/timeouts.go

Line 26 in 35d098a

PodDelete: 5 * time.Minute,

please add this info to the PR text

dims · 2024-11-22T13:37:59Z

test/e2e_node/criproxy_test.go

-		ginkgo.AfterEach(func() {
-			err := resetCRIProxyInjector(e2eCriProxy)
-			framework.ExpectNoError(err)
+			ginkgo.DeferCleanup(func() {


this pattern is way better. thanks!

esotsal · 2024-11-22T17:07:36Z

I believe pull-kubernetes-e2e-gce-canary failures are not related with InPlacePodVerticalScaling, created #128942 to monitor the flakes.

esotsal · 2024-11-23T04:08:53Z

test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
/test pull-kubernetes-e2e-ec2
/test pull-kubernetes-e2e-gce-canary
/test pull-kubernetes-e2e-ec2-conformance
/test pull-kubernetes-e2e-capz-windows-master
/test pull-kubernetes-node-kubelet-serial-containerd-kubetest2

hshiina · 2024-11-23T15:38:39Z

It looks that currently no job runs tests with enabling InPlacePodVerticalScaling.

Especially, pull-kubernetes-node-kubelet-serial-containerd-alpha-features runs no test. Something may be wrong with its label filter:
https://github.com/kubernetes/test-infra/blob/c08c9c08ebb44b03557afeb5dd58b4e81bf9b914/config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml#L660

esotsal · 2024-11-24T07:39:25Z

It looks that currently no job runs tests with enabling InPlacePodVerticalScaling.

Especially, pull-kubernetes-node-kubelet-serial-containerd-alpha-features runs no test. Something may be wrong with its label filter: https://github.com/kubernetes/test-infra/blob/c08c9c08ebb44b03557afeb5dd58b4e81bf9b914/config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml#L660

Thanks for letting us know, for this commit , I have not reverted the beta to debug the issue, InPlacePodVerticalScaling should be enabled by default. Being beta , might explain why pull-kubernetes-node-kubelet-serial-containerd-alpha-features does not run the tests?

esotsal · 2024-11-24T18:00:52Z

I think I will rebase this commit, findings are sufficient. Thank you all for your comments and feedback.

esotsal · 2024-11-24T18:02:18Z

It looks that currently no job runs tests with enabling InPlacePodVerticalScaling.
Especially, pull-kubernetes-node-kubelet-serial-containerd-alpha-features runs no test. Something may be wrong with its label filter: https://github.com/kubernetes/test-infra/blob/c08c9c08ebb44b03557afeb5dd58b4e81bf9b914/config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml#L660

Thanks for letting us know, for this commit , I have not reverted the beta to debug the issue, InPlacePodVerticalScaling should be enabled by default. Being beta , might explain why pull-kubernetes-node-kubelet-serial-containerd-alpha-features does not run the tests?

Will add it as an extra step when the beta is merged back not to forget

esotsal · 2024-11-25T08:17:59Z

/test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
/test pull-kubernetes-node-kubelet-serial-podresize
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
/test pull-kubernetes-e2e-gce-cos-alpha-features
/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
/test pull-kubernetes-e2e-ec2
/test pull-kubernetes-e2e-gce-canary
/test pull-kubernetes-e2e-ec2-conformance
/test pull-kubernetes-e2e-capz-windows-master
/test pull-kubernetes-node-kubelet-serial-containerd-kubetest2

esotsal · 2024-11-28T08:33:11Z

Will clean up this PR, keeping only the DeleteSync, removing #128889 and #128936 parts since they have their own PRs. Will update the PR description.

k8s-ci-robot · 2024-11-28T10:49:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, esotsal

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

esotsal · 2024-11-28T13:03:41Z

/retest

esotsal · 2024-11-28T17:13:03Z

/test pull-kubernetes-verify-lint

k8s-ci-robot · 2024-11-28T17:13:06Z

@esotsal: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test pull-cos-containerd-e2e-ubuntu-gce
/test pull-kubernetes-conformance-kind-ga-only-parallel
/test pull-kubernetes-coverage-unit
/test pull-kubernetes-dependencies
/test pull-kubernetes-dependencies-go-canary
/test pull-kubernetes-e2e-gce
/test pull-kubernetes-e2e-gce-100-performance
/test pull-kubernetes-e2e-gce-cos
/test pull-kubernetes-e2e-gce-cos-canary
/test pull-kubernetes-e2e-gce-cos-no-stage
/test pull-kubernetes-e2e-gce-network-proxy-http-connect
/test pull-kubernetes-e2e-gce-pull-through-cache
/test pull-kubernetes-e2e-kind
/test pull-kubernetes-e2e-kind-ipv6
/test pull-kubernetes-integration
/test pull-kubernetes-integration-go-canary
/test pull-kubernetes-kubemark-e2e-gce-scale
/test pull-kubernetes-node-e2e-containerd
/test pull-kubernetes-typecheck
/test pull-kubernetes-unit
/test pull-kubernetes-unit-go-canary
/test pull-kubernetes-update
/test pull-kubernetes-verify
/test pull-kubernetes-verify-go-canary

The following commands are available to trigger optional jobs:

/test check-dependency-stats
/test pull-crio-cgroupv1-node-e2e-eviction
/test pull-crio-cgroupv1-node-e2e-eviction-kubetest2
/test pull-crio-cgroupv1-node-e2e-features
/test pull-crio-cgroupv1-node-e2e-features-kubetest2
/test pull-crio-cgroupv1-node-e2e-hugepages
/test pull-crio-cgroupv1-node-e2e-hugepages-kubetest2
/test pull-crio-cgroupv1-node-e2e-resource-managers
/test pull-crio-cgroupv1-node-e2e-resource-managers-kubetest2
/test pull-crio-cgroupv2-imagefs-separatedisktest
/test pull-crio-cgroupv2-imagefs-separatedisktest-kubetest2
/test pull-crio-cgroupv2-node-e2e-eviction
/test pull-crio-cgroupv2-node-e2e-eviction-kubetest2
/test pull-crio-cgroupv2-node-e2e-hugepages
/test pull-crio-cgroupv2-node-e2e-hugepages-kubetest2
/test pull-crio-cgroupv2-node-e2e-resource-managers
/test pull-crio-cgroupv2-node-e2e-resource-managers-kubetest2
/test pull-crio-cgroupv2-splitfs-separate-disk
/test pull-crio-cgroupv2-splitfs-separate-disk-kubetest2
/test pull-e2e-gce-cloud-provider-disabled
/test pull-e2e-gci-gce-alpha-enabled-default
/test pull-kubernetes-apidiff
/test pull-kubernetes-conformance-image-test
/test pull-kubernetes-conformance-kind-ga-only
/test pull-kubernetes-conformance-kind-ipv6-parallel
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-crio-node-memoryqos-cgrpv2
/test pull-kubernetes-crio-node-memoryqos-cgrpv2-kubetest2
/test pull-kubernetes-cross
/test pull-kubernetes-e2e-autoscaling-hpa-cm
/test pull-kubernetes-e2e-autoscaling-hpa-cpu
/test pull-kubernetes-e2e-capz-azure-disk
/test pull-kubernetes-e2e-capz-azure-disk-vmss
/test pull-kubernetes-e2e-capz-azure-file
/test pull-kubernetes-e2e-capz-azure-file-vmss
/test pull-kubernetes-e2e-capz-conformance
/test pull-kubernetes-e2e-capz-master-windows-nodelogquery
/test pull-kubernetes-e2e-capz-windows-alpha-feature-vpa
/test pull-kubernetes-e2e-capz-windows-alpha-features
/test pull-kubernetes-e2e-capz-windows-master
/test pull-kubernetes-e2e-capz-windows-serial-slow
/test pull-kubernetes-e2e-capz-windows-serial-slow-hpa
/test pull-kubernetes-e2e-containerd-gce
/test pull-kubernetes-e2e-ec2
/test pull-kubernetes-e2e-ec2-arm64
/test pull-kubernetes-e2e-ec2-conformance
/test pull-kubernetes-e2e-ec2-conformance-arm64
/test pull-kubernetes-e2e-ec2-device-plugin-gpu
/test pull-kubernetes-e2e-gce-canary
/test pull-kubernetes-e2e-gce-correctness
/test pull-kubernetes-e2e-gce-cos-alpha-features
/test pull-kubernetes-e2e-gce-csi-serial
/test pull-kubernetes-e2e-gce-device-plugin-gpu
/test pull-kubernetes-e2e-gce-disruptive-canary
/test pull-kubernetes-e2e-gce-kubelet-credential-provider
/test pull-kubernetes-e2e-gce-network-policies
/test pull-kubernetes-e2e-gce-network-proxy-grpc
/test pull-kubernetes-e2e-gce-serial
/test pull-kubernetes-e2e-gce-serial-canary
/test pull-kubernetes-e2e-gce-storage-disruptive
/test pull-kubernetes-e2e-gce-storage-selinux
/test pull-kubernetes-e2e-gce-storage-slow
/test pull-kubernetes-e2e-gce-storage-snapshot
/test pull-kubernetes-e2e-gci-gce-autoscaling
/test pull-kubernetes-e2e-gci-gce-ingress
/test pull-kubernetes-e2e-gci-gce-ipvs
/test pull-kubernetes-e2e-gci-gce-nftables
/test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features
/test pull-kubernetes-e2e-kind-beta-features
/test pull-kubernetes-e2e-kind-canary
/test pull-kubernetes-e2e-kind-cloud-provider-loadbalancer
/test pull-kubernetes-e2e-kind-dual-canary
/test pull-kubernetes-e2e-kind-evented-pleg
/test pull-kubernetes-e2e-kind-ipv6-canary
/test pull-kubernetes-e2e-kind-ipvs
/test pull-kubernetes-e2e-kind-kms
/test pull-kubernetes-e2e-kind-multizone
/test pull-kubernetes-e2e-kind-nftables
/test pull-kubernetes-e2e-relaxed-environment-variable-validation
/test pull-kubernetes-e2e-storage-kind-alpha-beta-features
/test pull-kubernetes-e2e-storage-kind-disruptive
/test pull-kubernetes-e2e-storage-kind-volume-group-snapshots
/test pull-kubernetes-kind-dra
/test pull-kubernetes-kind-dra-all
/test pull-kubernetes-kind-json-logging
/test pull-kubernetes-kind-text-logging
/test pull-kubernetes-kubemark-e2e-gce-big
/test pull-kubernetes-linter-hints
/test pull-kubernetes-local-e2e
/test pull-kubernetes-node-arm64-e2e-containerd-ec2
/test pull-kubernetes-node-arm64-e2e-containerd-serial-ec2
/test pull-kubernetes-node-arm64-ubuntu-serial-gce
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2
/test pull-kubernetes-node-crio-cgrpv2-e2e
/test pull-kubernetes-node-crio-cgrpv2-e2e-kubetest2
/test pull-kubernetes-node-crio-cgrpv2-imagefs-e2e
/test pull-kubernetes-node-crio-cgrpv2-imagefs-e2e-kubetest2
/test pull-kubernetes-node-crio-cgrpv2-imagevolume-e2e
/test pull-kubernetes-node-crio-cgrpv2-imagevolume-e2e-kubetest2
/test pull-kubernetes-node-crio-cgrpv2-splitfs-e2e
/test pull-kubernetes-node-crio-cgrpv2-splitfs-e2e-kubetest2
/test pull-kubernetes-node-crio-cgrpv2-userns-e2e-serial
/test pull-kubernetes-node-crio-cgrpv2-userns-e2e-serial-kubetest2
/test pull-kubernetes-node-crio-e2e
/test pull-kubernetes-node-crio-e2e-kubetest2
/test pull-kubernetes-node-e2e-alpha-ec2
/test pull-kubernetes-node-e2e-containerd-1-7-dra
/test pull-kubernetes-node-e2e-containerd-alpha-features
/test pull-kubernetes-node-e2e-containerd-ec2
/test pull-kubernetes-node-e2e-containerd-features
/test pull-kubernetes-node-e2e-containerd-features-kubetest2
/test pull-kubernetes-node-e2e-containerd-kubetest2
/test pull-kubernetes-node-e2e-containerd-serial-ec2
/test pull-kubernetes-node-e2e-containerd-serial-ec2-eks
/test pull-kubernetes-node-e2e-containerd-standalone-mode
/test pull-kubernetes-node-e2e-containerd-standalone-mode-all-alpha
/test pull-kubernetes-node-e2e-cri-proxy-serial
/test pull-kubernetes-node-e2e-crio-cgrpv1-dra
/test pull-kubernetes-node-e2e-crio-cgrpv1-dra-kubetest2
/test pull-kubernetes-node-e2e-crio-cgrpv2-dra
/test pull-kubernetes-node-e2e-crio-cgrpv2-dra-kubetest2
/test pull-kubernetes-node-e2e-resource-health-status
/test pull-kubernetes-node-kubelet-containerd-flaky
/test pull-kubernetes-node-kubelet-credential-provider
/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
/test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
/test pull-kubernetes-node-kubelet-serial-cpu-manager
/test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv1-kubetest2
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv2-kubetest2
/test pull-kubernetes-node-kubelet-serial-hugepages
/test pull-kubernetes-node-kubelet-serial-memory-manager
/test pull-kubernetes-node-kubelet-serial-podresize
/test pull-kubernetes-node-kubelet-serial-podresources
/test pull-kubernetes-node-kubelet-serial-topology-manager
/test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2
/test pull-kubernetes-node-swap-conformance-fedora-serial
/test pull-kubernetes-node-swap-conformance-ubuntu-serial
/test pull-kubernetes-node-swap-fedora
/test pull-kubernetes-node-swap-fedora-serial
/test pull-kubernetes-node-swap-ubuntu-serial
/test pull-kubernetes-scheduler-perf
/test pull-kubernetes-unit-experimental
/test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

pull-kubernetes-conformance-kind-ga-only-parallel
pull-kubernetes-conformance-kind-ipv6-parallel
pull-kubernetes-dependencies
pull-kubernetes-e2e-ec2
pull-kubernetes-e2e-ec2-conformance
pull-kubernetes-e2e-gce
pull-kubernetes-e2e-gce-canary
pull-kubernetes-e2e-gce-network-policies
pull-kubernetes-e2e-gci-gce-ingress
pull-kubernetes-e2e-gci-gce-ipvs
pull-kubernetes-e2e-kind
pull-kubernetes-e2e-kind-ipv6
pull-kubernetes-e2e-kind-ipvs
pull-kubernetes-e2e-kind-nftables
pull-kubernetes-integration
pull-kubernetes-linter-hints
pull-kubernetes-node-e2e-containerd
pull-kubernetes-typecheck
pull-kubernetes-unit
pull-kubernetes-verify

In response to this:

/test pull-kubernetes-verify-lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

esotsal · 2024-11-29T07:43:45Z

/hold cancel

esotsal · 2024-12-03T08:57:59Z

/retest

pacoxu · 2024-12-12T07:49:26Z

/test pull-kubernetes-e2e-gce-cos-alpha-features

k8s-ci-robot requested review from derekwaynecarr and sjenning November 20, 2024 11:45

esotsal mentioned this pull request Nov 20, 2024

[Failing Test] Container Runtime blackbox test on terminated container should report termination message if TerminationMessagePath is set as non-root user and at a non-default path [NodeConformance] [Conformance] #128783

Closed

esotsal force-pushed the esotsal/fix-128837 branch from 8617463 to 55ad72d Compare November 20, 2024 11:57

k8s-ci-robot requested review from tallclair and AnishShah November 20, 2024 12:07

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 20, 2024

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 20, 2024

aojea reviewed Nov 20, 2024

View reviewed changes

dims reviewed Nov 22, 2024

View reviewed changes

esotsal force-pushed the esotsal/fix-128837 branch from f1b71b9 to 4cf1a3d Compare November 22, 2024 15:52

esotsal force-pushed the esotsal/fix-128837 branch 2 times, most recently from 726dc3e to 08a700f Compare November 25, 2024 08:14

This was referenced Nov 25, 2024

Failure cluster [6140da15...] e2e Flake: Finished pod is deleted before events can be retrieved #114971

Open

[WIP][FG:InPlacePodVerticalScaling] Fix windows test fail with InPlacePodVerticalScaling #128936

Open

Fix pod delete issues in podresize tests

bb94a06

esotsal force-pushed the esotsal/fix-128837 branch from 08a700f to bb94a06 Compare November 28, 2024 10:49

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 28, 2024

esotsal changed the title ~~Debug and try to fix pipeline failures after InPlacePodVerticalScaling Graduate to Beta commit~~ testing: Fix pod delete timeout failures after InPlacePodVerticalScaling Graduate to Beta commit Nov 28, 2024

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 29, 2024

tallclair mentioned this pull request Dec 13, 2024

[FG:InPlacePodVerticalScaling] reduce container resources in tests #128719

Merged

	// DeleteSync deletes the pod and wait for the pod to disappear for `timeout`. If the pod doesn't
	// disappear before the timeout, it will fail the test.
	func (c *PodClient) DeleteSync(ctx context.Context, name string, options metav1.DeleteOptions, timeout time.Duration) {
	err := c.Delete(ctx, name, options)
	if err != nil && !apierrors.IsNotFound(err) {
	framework.Failf("Failed to delete pod %q: %v", name, err)
	}
	framework.ExpectNoError(WaitForPodNotFoundInNamespace(ctx, c.f.ClientSet, name, c.namespace, timeout), "wait for pod %q to disappear", name)
	}

testing: Fix pod delete timeout failures after InPlacePodVerticalScaling Graduate to Beta commit #128880

Are you sure you want to change the base?

testing: Fix pod delete timeout failures after InPlacePodVerticalScaling Graduate to Beta commit #128880

Conversation

esotsal commented Nov 20, 2024 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issues this PR fixes

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

esotsal commented Nov 20, 2024

k8s-ci-robot commented Nov 20, 2024

esotsal commented Nov 20, 2024

kannon92 commented Nov 20, 2024

kannon92 commented Nov 20, 2024

k8s-ci-robot commented Nov 20, 2024

kannon92 commented Nov 20, 2024

dims commented Nov 20, 2024

aojea Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

esotsal commented Nov 22, 2024 • edited Loading

esotsal commented Nov 23, 2024

hshiina commented Nov 23, 2024

esotsal commented Nov 24, 2024 • edited Loading

esotsal commented Nov 24, 2024 • edited Loading

esotsal commented Nov 24, 2024 • edited Loading

esotsal commented Nov 25, 2024

esotsal commented Nov 28, 2024 • edited Loading

k8s-ci-robot commented Nov 28, 2024

esotsal commented Nov 28, 2024

esotsal commented Nov 28, 2024

k8s-ci-robot commented Nov 28, 2024

esotsal commented Nov 29, 2024

esotsal commented Dec 3, 2024

pacoxu commented Dec 12, 2024

esotsal commented Nov 20, 2024 •

edited

Loading

aojea Nov 20, 2024 •

edited

Loading

pohly Nov 25, 2024 •

edited

Loading

esotsal commented Nov 22, 2024 •

edited

Loading

esotsal commented Nov 24, 2024 •

edited

Loading

esotsal commented Nov 24, 2024 •

edited

Loading

esotsal commented Nov 24, 2024 •

edited

Loading

esotsal commented Nov 28, 2024 •

edited

Loading