Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: Fix pod delete timeout failures after InPlacePodVerticalScaling Graduate to Beta commit #128880

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

esotsal
Copy link

@esotsal esotsal commented Nov 20, 2024

What type of PR is this?

/kind bug
/kind failing-test

What this PR does / why we need it:

Please check analysis here and conversation in this PR below.

In brief this commit modifies testing code, and ensure same delete timeout duration is used, more specificaly

  • replaced DeletePod timeouts used before with framework.timeouts.PodDelete, previous timeouts where less than 5 minutes, please see full commit for details.

    PodDelete: 5 * time.Minute,

  • Replace of Delete with DeleteSync pod_resize test, with a 5 minute timeout, to ensure Pods are Deleted and will not impact some tests which were not resilient with the existence of pods.

Which issues this PR fixes

Relates to #128837
Fixes test failures due to pods not deleted with the introduction of InPlacePodVerticalScaling Beta

Special notes for your reviewer:

Please see files changes for more details.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 20, 2024
@esotsal
Copy link
Author

esotsal commented Nov 20, 2024

/cc @tallclair @hshiina @AnishShah

@k8s-ci-robot
Copy link
Contributor

@esotsal: GitHub didn't allow me to request PR reviews from the following users: hshiina.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @tallclair @hshiina @AnishShah

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@esotsal
Copy link
Author

esotsal commented Nov 20, 2024

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 20, 2024
@kannon92
Copy link
Contributor

/test pull-kubernetes-e2e-capz-windows-master

@kannon92
Copy link
Contributor

/test

@k8s-ci-robot
Copy link
Contributor

@kannon92: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

  • /test pull-cos-containerd-e2e-ubuntu-gce
  • /test pull-kubernetes-conformance-kind-ga-only-parallel
  • /test pull-kubernetes-coverage-unit
  • /test pull-kubernetes-dependencies
  • /test pull-kubernetes-dependencies-go-canary
  • /test pull-kubernetes-e2e-gce
  • /test pull-kubernetes-e2e-gce-100-performance
  • /test pull-kubernetes-e2e-gce-cos
  • /test pull-kubernetes-e2e-gce-cos-canary
  • /test pull-kubernetes-e2e-gce-cos-no-stage
  • /test pull-kubernetes-e2e-gce-network-proxy-http-connect
  • /test pull-kubernetes-e2e-gce-pull-through-cache
  • /test pull-kubernetes-e2e-kind
  • /test pull-kubernetes-e2e-kind-ipv6
  • /test pull-kubernetes-integration
  • /test pull-kubernetes-integration-go-canary
  • /test pull-kubernetes-kubemark-e2e-gce-scale
  • /test pull-kubernetes-node-e2e-containerd
  • /test pull-kubernetes-typecheck
  • /test pull-kubernetes-unit
  • /test pull-kubernetes-unit-go-canary
  • /test pull-kubernetes-update
  • /test pull-kubernetes-verify
  • /test pull-kubernetes-verify-go-canary
  • /test pull-kubernetes-verify-lint

The following commands are available to trigger optional jobs:

  • /test check-dependency-stats
  • /test pull-crio-cgroupv1-node-e2e-eviction
  • /test pull-crio-cgroupv1-node-e2e-eviction-kubetest2
  • /test pull-crio-cgroupv1-node-e2e-features
  • /test pull-crio-cgroupv1-node-e2e-features-kubetest2
  • /test pull-crio-cgroupv1-node-e2e-hugepages
  • /test pull-crio-cgroupv1-node-e2e-hugepages-kubetest2
  • /test pull-crio-cgroupv1-node-e2e-resource-managers
  • /test pull-crio-cgroupv1-node-e2e-resource-managers-kubetest2
  • /test pull-crio-cgroupv2-imagefs-separatedisktest
  • /test pull-crio-cgroupv2-imagefs-separatedisktest-kubetest2
  • /test pull-crio-cgroupv2-node-e2e-eviction
  • /test pull-crio-cgroupv2-node-e2e-eviction-kubetest2
  • /test pull-crio-cgroupv2-node-e2e-hugepages
  • /test pull-crio-cgroupv2-node-e2e-hugepages-kubetest2
  • /test pull-crio-cgroupv2-node-e2e-resource-managers
  • /test pull-crio-cgroupv2-node-e2e-resource-managers-kubetest2
  • /test pull-crio-cgroupv2-splitfs-separate-disk
  • /test pull-crio-cgroupv2-splitfs-separate-disk-kubetest2
  • /test pull-e2e-gce-cloud-provider-disabled
  • /test pull-e2e-gci-gce-alpha-enabled-default
  • /test pull-kubernetes-apidiff
  • /test pull-kubernetes-conformance-image-test
  • /test pull-kubernetes-conformance-kind-ga-only
  • /test pull-kubernetes-conformance-kind-ipv6-parallel
  • /test pull-kubernetes-cos-cgroupv1-containerd-node-e2e
  • /test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
  • /test pull-kubernetes-crio-node-memoryqos-cgrpv2
  • /test pull-kubernetes-crio-node-memoryqos-cgrpv2-kubetest2
  • /test pull-kubernetes-cross
  • /test pull-kubernetes-e2e-autoscaling-hpa-cm
  • /test pull-kubernetes-e2e-autoscaling-hpa-cpu
  • /test pull-kubernetes-e2e-capz-azure-disk
  • /test pull-kubernetes-e2e-capz-azure-disk-vmss
  • /test pull-kubernetes-e2e-capz-azure-file
  • /test pull-kubernetes-e2e-capz-azure-file-vmss
  • /test pull-kubernetes-e2e-capz-conformance
  • /test pull-kubernetes-e2e-capz-master-windows-nodelogquery
  • /test pull-kubernetes-e2e-capz-windows-alpha-feature-vpa
  • /test pull-kubernetes-e2e-capz-windows-alpha-features
  • /test pull-kubernetes-e2e-capz-windows-master
  • /test pull-kubernetes-e2e-capz-windows-serial-slow
  • /test pull-kubernetes-e2e-capz-windows-serial-slow-hpa
  • /test pull-kubernetes-e2e-containerd-gce
  • /test pull-kubernetes-e2e-ec2
  • /test pull-kubernetes-e2e-ec2-arm64
  • /test pull-kubernetes-e2e-ec2-conformance
  • /test pull-kubernetes-e2e-ec2-conformance-arm64
  • /test pull-kubernetes-e2e-ec2-device-plugin-gpu
  • /test pull-kubernetes-e2e-gce-canary
  • /test pull-kubernetes-e2e-gce-correctness
  • /test pull-kubernetes-e2e-gce-cos-alpha-features
  • /test pull-kubernetes-e2e-gce-csi-serial
  • /test pull-kubernetes-e2e-gce-device-plugin-gpu
  • /test pull-kubernetes-e2e-gce-disruptive-canary
  • /test pull-kubernetes-e2e-gce-kubelet-credential-provider
  • /test pull-kubernetes-e2e-gce-network-policies
  • /test pull-kubernetes-e2e-gce-network-proxy-grpc
  • /test pull-kubernetes-e2e-gce-serial
  • /test pull-kubernetes-e2e-gce-serial-canary
  • /test pull-kubernetes-e2e-gce-storage-disruptive
  • /test pull-kubernetes-e2e-gce-storage-selinux
  • /test pull-kubernetes-e2e-gce-storage-slow
  • /test pull-kubernetes-e2e-gce-storage-snapshot
  • /test pull-kubernetes-e2e-gci-gce-autoscaling
  • /test pull-kubernetes-e2e-gci-gce-ingress
  • /test pull-kubernetes-e2e-gci-gce-ipvs
  • /test pull-kubernetes-e2e-gci-gce-nftables
  • /test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
  • /test pull-kubernetes-e2e-kind-alpha-beta-features
  • /test pull-kubernetes-e2e-kind-alpha-features
  • /test pull-kubernetes-e2e-kind-beta-features
  • /test pull-kubernetes-e2e-kind-canary
  • /test pull-kubernetes-e2e-kind-cloud-provider-loadbalancer
  • /test pull-kubernetes-e2e-kind-dual-canary
  • /test pull-kubernetes-e2e-kind-evented-pleg
  • /test pull-kubernetes-e2e-kind-ipv6-canary
  • /test pull-kubernetes-e2e-kind-ipvs
  • /test pull-kubernetes-e2e-kind-kms
  • /test pull-kubernetes-e2e-kind-multizone
  • /test pull-kubernetes-e2e-kind-nftables
  • /test pull-kubernetes-e2e-relaxed-environment-variable-validation
  • /test pull-kubernetes-e2e-storage-kind-alpha-beta-features
  • /test pull-kubernetes-e2e-storage-kind-disruptive
  • /test pull-kubernetes-e2e-storage-kind-volume-group-snapshots
  • /test pull-kubernetes-kind-dra
  • /test pull-kubernetes-kind-dra-all
  • /test pull-kubernetes-kind-json-logging
  • /test pull-kubernetes-kind-text-logging
  • /test pull-kubernetes-kubemark-e2e-gce-big
  • /test pull-kubernetes-linter-hints
  • /test pull-kubernetes-local-e2e
  • /test pull-kubernetes-node-arm64-e2e-containerd-ec2
  • /test pull-kubernetes-node-arm64-e2e-containerd-serial-ec2
  • /test pull-kubernetes-node-arm64-ubuntu-serial-gce
  • /test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
  • /test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-imagefs-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-imagefs-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-imagevolume-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-imagevolume-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-splitfs-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-splitfs-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-userns-e2e-serial
  • /test pull-kubernetes-node-crio-cgrpv2-userns-e2e-serial-kubetest2
  • /test pull-kubernetes-node-crio-e2e
  • /test pull-kubernetes-node-crio-e2e-kubetest2
  • /test pull-kubernetes-node-e2e-alpha-ec2
  • /test pull-kubernetes-node-e2e-containerd-1-7-dra
  • /test pull-kubernetes-node-e2e-containerd-alpha-features
  • /test pull-kubernetes-node-e2e-containerd-ec2
  • /test pull-kubernetes-node-e2e-containerd-features
  • /test pull-kubernetes-node-e2e-containerd-features-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-serial-ec2
  • /test pull-kubernetes-node-e2e-containerd-serial-ec2-eks
  • /test pull-kubernetes-node-e2e-containerd-standalone-mode
  • /test pull-kubernetes-node-e2e-containerd-standalone-mode-all-alpha
  • /test pull-kubernetes-node-e2e-cri-proxy-serial
  • /test pull-kubernetes-node-e2e-crio-cgrpv1-dra
  • /test pull-kubernetes-node-e2e-crio-cgrpv1-dra-kubetest2
  • /test pull-kubernetes-node-e2e-crio-cgrpv2-dra
  • /test pull-kubernetes-node-e2e-crio-cgrpv2-dra-kubetest2
  • /test pull-kubernetes-node-e2e-resource-health-status
  • /test pull-kubernetes-node-kubelet-containerd-flaky
  • /test pull-kubernetes-node-kubelet-credential-provider
  • /test pull-kubernetes-node-kubelet-serial-containerd
  • /test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
  • /test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv1-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-hugepages
  • /test pull-kubernetes-node-kubelet-serial-memory-manager
  • /test pull-kubernetes-node-kubelet-serial-podresize
  • /test pull-kubernetes-node-kubelet-serial-podresources
  • /test pull-kubernetes-node-kubelet-serial-topology-manager
  • /test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2
  • /test pull-kubernetes-node-swap-conformance-fedora-serial
  • /test pull-kubernetes-node-swap-conformance-ubuntu-serial
  • /test pull-kubernetes-node-swap-fedora
  • /test pull-kubernetes-node-swap-fedora-serial
  • /test pull-kubernetes-node-swap-ubuntu-serial
  • /test pull-kubernetes-scheduler-perf
  • /test pull-kubernetes-unit-experimental
  • /test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

  • pull-kubernetes-conformance-kind-ga-only-parallel
  • pull-kubernetes-conformance-kind-ipv6-parallel
  • pull-kubernetes-dependencies
  • pull-kubernetes-e2e-ec2
  • pull-kubernetes-e2e-ec2-conformance
  • pull-kubernetes-e2e-gce
  • pull-kubernetes-e2e-gce-canary
  • pull-kubernetes-e2e-kind
  • pull-kubernetes-e2e-kind-ipv6
  • pull-kubernetes-integration
  • pull-kubernetes-linter-hints
  • pull-kubernetes-node-e2e-containerd
  • pull-kubernetes-typecheck
  • pull-kubernetes-unit
  • pull-kubernetes-verify
  • pull-kubernetes-verify-lint

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kannon92
Copy link
Contributor

/test pull-kubernetes-node-kubelet-serial-containerd

/hold

Let’s confirm that this fixes the ci issues before merge.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 20, 2024
@dims
Copy link
Member

dims commented Nov 20, 2024

/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
/test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers

@@ -1006,7 +1008,7 @@ func doPodResizeTests(f *framework.Framework) {
}

ginkgo.By("deleting pod")
framework.ExpectNoError(podClient.Delete(ctx, newPod.Name, metav1.DeleteOptions{}))
podClient.DeleteSync(ctx, newPod.Name, metav1.DeleteOptions{}, timeouts.PodDelete)
Copy link
Member

@aojea aojea Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the Delete call erroring now?
why should we ignore the error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree using DeleteSync is better and very likely to fix or at least significantly improve flakiness/failures we observed lately.
But I also agree we should not ignore the error or at very least clearly documenting why.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont ignore the error , DeleteSync handles it internally or ?

// DeleteSync deletes the pod and wait for the pod to disappear for `timeout`. If the pod doesn't
// disappear before the timeout, it will fail the test.
func (c *PodClient) DeleteSync(ctx context.Context, name string, options metav1.DeleteOptions, timeout time.Duration) {
err := c.Delete(ctx, name, options)
if err != nil && !apierrors.IsNotFound(err) {
framework.Failf("Failed to delete pod %q: %v", name, err)
}
framework.ExpectNoError(WaitForPodNotFoundInNamespace(ctx, c.f.ClientSet, name, c.namespace, timeout), "wait for pod %q to disappear", name)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, pod client.DeleteSync checks for the error internally

func (c *PodClient) DeleteSync(ctx context.Context, name string, options metav1.DeleteOptions, timeout time.Duration) {
err := c.Delete(ctx, name, options)
if err != nil && !apierrors.IsNotFound(err) {
framework.Failf("Failed to delete pod %q: %v", name, err)
}
framework.ExpectNoError(WaitForPodNotFoundInNamespace(ctx, c.f.ClientSet, name, c.namespace, timeout), "wait for pod %q to disappear", name)
}

/lgtm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it thanks for the extra explanation @esotsal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we replace the command in a pod for resize tests with InfiniteSleepCommand or insert trap for handling SIGTERM? I think this blocks pod deletion for 30 seconds:

cmd := "grep Cpus_allowed_list /proc/self/status | cut -f2 && sleep 1d"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the pods terminates faster that would help a lot, and we can and should apply the same fix in quite other places, but still we will need to have guaranteed cleanup between tests in the lane marked serial (or to rethink from scratch a bunch of resource management tests)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tallclair i think stability should be more important to test duration. Yes the flakes are due to undeleted pods, please check here

Does this mean that we do not have to revert the promotion #128875?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to fix #128917 (although it's an easy fix), and I don't think the windows test failures are due to undeleted pods (though I could be wrong).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we replace the command in a pod for resize tests with InfiniteSleepCommand or insert trap for handling SIGTERM? I think this blocks pod deletion for 30 seconds:

cmd := "grep Cpus_allowed_list /proc/self/status | cut -f2 && sleep 1d"

Missed this one , sorry @hshiina . acknowledged, will give it a try as well.

@@ -549,6 +550,9 @@ func containerSucceeded(c *v1.Container, podStatus *kubecontainer.PodStatus) boo
}

func IsInPlacePodVerticalScalingAllowed(pod *v1.Pod) bool {
if runtime.GOOS == "windows" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks darn important!!!

Copy link
Contributor

@pohly pohly Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And ideally, this shouldn't be a runtime check. Isn't there something to mark a test as "does not work on Windows"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may move this function IsInPlacePodVerticalScalingAllowed to pkg/kubelet/kuberuntime/kuberuntime_manager_linux.go and (pkg/kubelet/kuberuntime/kuberuntime_manager_windows.go or kuberuntime_manager_others.go).

Add "[LinuxOnly]" to the e2e test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds worth trying, @pacoxu , @pohly @dims moving discussion here

@@ -1006,7 +1006,7 @@ func doPodResizeTests(f *framework.Framework) {
}

ginkgo.By("deleting pod")
framework.ExpectNoError(podClient.Delete(ctx, newPod.Name, metav1.DeleteOptions{}))
podClient.DeleteSync(ctx, newPod.Name, metav1.DeleteOptions{}, f.Timeouts.PodDelete)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are ok to skip checking the error here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont ignore the error , DeleteSync handles it internally.

// DeleteSync deletes the pod and wait for the pod to disappear for `timeout`. If the pod doesn't
// disappear before the timeout, it will fail the test.
func (c *PodClient) DeleteSync(ctx context.Context, name string, options metav1.DeleteOptions, timeout time.Duration) {
err := c.Delete(ctx, name, options)
if err != nil && !apierrors.IsNotFound(err) {
framework.Failf("Failed to delete pod %q: %v", name, err)
}
framework.ExpectNoError(WaitForPodNotFoundInNamespace(ctx, c.f.ClientSet, name, c.namespace, timeout), "wait for pod %q to disappear", name)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I hadn't noticed the switch from Delete to DeleteSync.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks @esotsal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, DeleteSync is not aligned with our E2E framework guidelines (should return an error), but it is what it is.

@@ -115,7 +115,7 @@ var _ = SIGDescribe("Container Lifecycle Hook", func() {
}, postStartWaitTimeout, podCheckInterval).Should(gomega.BeNil())
}
ginkgo.By("delete the pod with lifecycle hook")
podClient.DeleteSync(ctx, podWithHook.Name, *metav1.NewDeleteOptions(15), e2epod.DefaultPodDeletionTimeout)
podClient.DeleteSync(ctx, podWithHook.Name, *metav1.NewDeleteOptions(15), f.Timeouts.PodDelete)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please call out what is the difference between the two timeouts and highlight it in the PR text?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing the new one is longer than the old one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,

DefaultPodDeletionTimeout = 3 * time.Minute

PodDelete: 5 * time.Minute,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add this info to the PR text

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ginkgo.AfterEach(func() {
err := resetCRIProxyInjector(e2eCriProxy)
framework.ExpectNoError(err)
ginkgo.DeferCleanup(func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pattern is way better. thanks!

@esotsal
Copy link
Author

esotsal commented Nov 22, 2024

I believe pull-kubernetes-e2e-gce-canary failures are not related with InPlacePodVerticalScaling, created #128942 to monitor the flakes.

@esotsal
Copy link
Author

esotsal commented Nov 23, 2024

test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
/test pull-kubernetes-e2e-ec2
/test pull-kubernetes-e2e-gce-canary
/test pull-kubernetes-e2e-ec2-conformance
/test pull-kubernetes-e2e-capz-windows-master
/test pull-kubernetes-node-kubelet-serial-containerd-kubetest2

@hshiina
Copy link
Contributor

hshiina commented Nov 23, 2024

It looks that currently no job runs tests with enabling InPlacePodVerticalScaling.

Especially, pull-kubernetes-node-kubelet-serial-containerd-alpha-features runs no test. Something may be wrong with its label filter:
https://github.com/kubernetes/test-infra/blob/c08c9c08ebb44b03557afeb5dd58b4e81bf9b914/config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml#L660

@esotsal
Copy link
Author

esotsal commented Nov 24, 2024

It looks that currently no job runs tests with enabling InPlacePodVerticalScaling.

Especially, pull-kubernetes-node-kubelet-serial-containerd-alpha-features runs no test. Something may be wrong with its label filter: https://github.com/kubernetes/test-infra/blob/c08c9c08ebb44b03557afeb5dd58b4e81bf9b914/config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml#L660

Thanks for letting us know, for this commit , I have not reverted the beta to debug the issue, InPlacePodVerticalScaling should be enabled by default. Being beta , might explain why pull-kubernetes-node-kubelet-serial-containerd-alpha-features does not run the tests?

@esotsal
Copy link
Author

esotsal commented Nov 24, 2024

I think I will rebase this commit, findings are sufficient. Thank you all for your comments and feedback.

@esotsal
Copy link
Author

esotsal commented Nov 24, 2024

It looks that currently no job runs tests with enabling InPlacePodVerticalScaling.
Especially, pull-kubernetes-node-kubelet-serial-containerd-alpha-features runs no test. Something may be wrong with its label filter: https://github.com/kubernetes/test-infra/blob/c08c9c08ebb44b03557afeb5dd58b4e81bf9b914/config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml#L660

Thanks for letting us know, for this commit , I have not reverted the beta to debug the issue, InPlacePodVerticalScaling should be enabled by default. Being beta , might explain why pull-kubernetes-node-kubelet-serial-containerd-alpha-features does not run the tests?

Will add it as an extra step when the beta is merged back not to forget

@esotsal esotsal force-pushed the esotsal/fix-128837 branch 2 times, most recently from 726dc3e to 08a700f Compare November 25, 2024 08:14
@esotsal
Copy link
Author

esotsal commented Nov 25, 2024

/test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
/test pull-kubernetes-node-kubelet-serial-podresize
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
/test pull-kubernetes-e2e-gce-cos-alpha-features
/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
/test pull-kubernetes-e2e-ec2
/test pull-kubernetes-e2e-gce-canary
/test pull-kubernetes-e2e-ec2-conformance
/test pull-kubernetes-e2e-capz-windows-master
/test pull-kubernetes-node-kubelet-serial-containerd-kubetest2

@esotsal
Copy link
Author

esotsal commented Nov 28, 2024

Will clean up this PR, keeping only the DeleteSync, removing #128889 and #128936 parts since they have their own PRs. Will update the PR description.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, esotsal

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 28, 2024
@esotsal esotsal changed the title Debug and try to fix pipeline failures after InPlacePodVerticalScaling Graduate to Beta commit testing: Fix pod delete timeout failures after InPlacePodVerticalScaling Graduate to Beta commit Nov 28, 2024
@esotsal
Copy link
Author

esotsal commented Nov 28, 2024

/retest

@esotsal
Copy link
Author

esotsal commented Nov 28, 2024

/test pull-kubernetes-verify-lint

@k8s-ci-robot
Copy link
Contributor

@esotsal: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cos-containerd-e2e-ubuntu-gce
  • /test pull-kubernetes-conformance-kind-ga-only-parallel
  • /test pull-kubernetes-coverage-unit
  • /test pull-kubernetes-dependencies
  • /test pull-kubernetes-dependencies-go-canary
  • /test pull-kubernetes-e2e-gce
  • /test pull-kubernetes-e2e-gce-100-performance
  • /test pull-kubernetes-e2e-gce-cos
  • /test pull-kubernetes-e2e-gce-cos-canary
  • /test pull-kubernetes-e2e-gce-cos-no-stage
  • /test pull-kubernetes-e2e-gce-network-proxy-http-connect
  • /test pull-kubernetes-e2e-gce-pull-through-cache
  • /test pull-kubernetes-e2e-kind
  • /test pull-kubernetes-e2e-kind-ipv6
  • /test pull-kubernetes-integration
  • /test pull-kubernetes-integration-go-canary
  • /test pull-kubernetes-kubemark-e2e-gce-scale
  • /test pull-kubernetes-node-e2e-containerd
  • /test pull-kubernetes-typecheck
  • /test pull-kubernetes-unit
  • /test pull-kubernetes-unit-go-canary
  • /test pull-kubernetes-update
  • /test pull-kubernetes-verify
  • /test pull-kubernetes-verify-go-canary

The following commands are available to trigger optional jobs:

  • /test check-dependency-stats
  • /test pull-crio-cgroupv1-node-e2e-eviction
  • /test pull-crio-cgroupv1-node-e2e-eviction-kubetest2
  • /test pull-crio-cgroupv1-node-e2e-features
  • /test pull-crio-cgroupv1-node-e2e-features-kubetest2
  • /test pull-crio-cgroupv1-node-e2e-hugepages
  • /test pull-crio-cgroupv1-node-e2e-hugepages-kubetest2
  • /test pull-crio-cgroupv1-node-e2e-resource-managers
  • /test pull-crio-cgroupv1-node-e2e-resource-managers-kubetest2
  • /test pull-crio-cgroupv2-imagefs-separatedisktest
  • /test pull-crio-cgroupv2-imagefs-separatedisktest-kubetest2
  • /test pull-crio-cgroupv2-node-e2e-eviction
  • /test pull-crio-cgroupv2-node-e2e-eviction-kubetest2
  • /test pull-crio-cgroupv2-node-e2e-hugepages
  • /test pull-crio-cgroupv2-node-e2e-hugepages-kubetest2
  • /test pull-crio-cgroupv2-node-e2e-resource-managers
  • /test pull-crio-cgroupv2-node-e2e-resource-managers-kubetest2
  • /test pull-crio-cgroupv2-splitfs-separate-disk
  • /test pull-crio-cgroupv2-splitfs-separate-disk-kubetest2
  • /test pull-e2e-gce-cloud-provider-disabled
  • /test pull-e2e-gci-gce-alpha-enabled-default
  • /test pull-kubernetes-apidiff
  • /test pull-kubernetes-conformance-image-test
  • /test pull-kubernetes-conformance-kind-ga-only
  • /test pull-kubernetes-conformance-kind-ipv6-parallel
  • /test pull-kubernetes-cos-cgroupv1-containerd-node-e2e
  • /test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
  • /test pull-kubernetes-crio-node-memoryqos-cgrpv2
  • /test pull-kubernetes-crio-node-memoryqos-cgrpv2-kubetest2
  • /test pull-kubernetes-cross
  • /test pull-kubernetes-e2e-autoscaling-hpa-cm
  • /test pull-kubernetes-e2e-autoscaling-hpa-cpu
  • /test pull-kubernetes-e2e-capz-azure-disk
  • /test pull-kubernetes-e2e-capz-azure-disk-vmss
  • /test pull-kubernetes-e2e-capz-azure-file
  • /test pull-kubernetes-e2e-capz-azure-file-vmss
  • /test pull-kubernetes-e2e-capz-conformance
  • /test pull-kubernetes-e2e-capz-master-windows-nodelogquery
  • /test pull-kubernetes-e2e-capz-windows-alpha-feature-vpa
  • /test pull-kubernetes-e2e-capz-windows-alpha-features
  • /test pull-kubernetes-e2e-capz-windows-master
  • /test pull-kubernetes-e2e-capz-windows-serial-slow
  • /test pull-kubernetes-e2e-capz-windows-serial-slow-hpa
  • /test pull-kubernetes-e2e-containerd-gce
  • /test pull-kubernetes-e2e-ec2
  • /test pull-kubernetes-e2e-ec2-arm64
  • /test pull-kubernetes-e2e-ec2-conformance
  • /test pull-kubernetes-e2e-ec2-conformance-arm64
  • /test pull-kubernetes-e2e-ec2-device-plugin-gpu
  • /test pull-kubernetes-e2e-gce-canary
  • /test pull-kubernetes-e2e-gce-correctness
  • /test pull-kubernetes-e2e-gce-cos-alpha-features
  • /test pull-kubernetes-e2e-gce-csi-serial
  • /test pull-kubernetes-e2e-gce-device-plugin-gpu
  • /test pull-kubernetes-e2e-gce-disruptive-canary
  • /test pull-kubernetes-e2e-gce-kubelet-credential-provider
  • /test pull-kubernetes-e2e-gce-network-policies
  • /test pull-kubernetes-e2e-gce-network-proxy-grpc
  • /test pull-kubernetes-e2e-gce-serial
  • /test pull-kubernetes-e2e-gce-serial-canary
  • /test pull-kubernetes-e2e-gce-storage-disruptive
  • /test pull-kubernetes-e2e-gce-storage-selinux
  • /test pull-kubernetes-e2e-gce-storage-slow
  • /test pull-kubernetes-e2e-gce-storage-snapshot
  • /test pull-kubernetes-e2e-gci-gce-autoscaling
  • /test pull-kubernetes-e2e-gci-gce-ingress
  • /test pull-kubernetes-e2e-gci-gce-ipvs
  • /test pull-kubernetes-e2e-gci-gce-nftables
  • /test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
  • /test pull-kubernetes-e2e-kind-alpha-beta-features
  • /test pull-kubernetes-e2e-kind-alpha-features
  • /test pull-kubernetes-e2e-kind-beta-features
  • /test pull-kubernetes-e2e-kind-canary
  • /test pull-kubernetes-e2e-kind-cloud-provider-loadbalancer
  • /test pull-kubernetes-e2e-kind-dual-canary
  • /test pull-kubernetes-e2e-kind-evented-pleg
  • /test pull-kubernetes-e2e-kind-ipv6-canary
  • /test pull-kubernetes-e2e-kind-ipvs
  • /test pull-kubernetes-e2e-kind-kms
  • /test pull-kubernetes-e2e-kind-multizone
  • /test pull-kubernetes-e2e-kind-nftables
  • /test pull-kubernetes-e2e-relaxed-environment-variable-validation
  • /test pull-kubernetes-e2e-storage-kind-alpha-beta-features
  • /test pull-kubernetes-e2e-storage-kind-disruptive
  • /test pull-kubernetes-e2e-storage-kind-volume-group-snapshots
  • /test pull-kubernetes-kind-dra
  • /test pull-kubernetes-kind-dra-all
  • /test pull-kubernetes-kind-json-logging
  • /test pull-kubernetes-kind-text-logging
  • /test pull-kubernetes-kubemark-e2e-gce-big
  • /test pull-kubernetes-linter-hints
  • /test pull-kubernetes-local-e2e
  • /test pull-kubernetes-node-arm64-e2e-containerd-ec2
  • /test pull-kubernetes-node-arm64-e2e-containerd-serial-ec2
  • /test pull-kubernetes-node-arm64-ubuntu-serial-gce
  • /test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
  • /test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-imagefs-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-imagefs-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-imagevolume-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-imagevolume-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-splitfs-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-splitfs-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-userns-e2e-serial
  • /test pull-kubernetes-node-crio-cgrpv2-userns-e2e-serial-kubetest2
  • /test pull-kubernetes-node-crio-e2e
  • /test pull-kubernetes-node-crio-e2e-kubetest2
  • /test pull-kubernetes-node-e2e-alpha-ec2
  • /test pull-kubernetes-node-e2e-containerd-1-7-dra
  • /test pull-kubernetes-node-e2e-containerd-alpha-features
  • /test pull-kubernetes-node-e2e-containerd-ec2
  • /test pull-kubernetes-node-e2e-containerd-features
  • /test pull-kubernetes-node-e2e-containerd-features-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-serial-ec2
  • /test pull-kubernetes-node-e2e-containerd-serial-ec2-eks
  • /test pull-kubernetes-node-e2e-containerd-standalone-mode
  • /test pull-kubernetes-node-e2e-containerd-standalone-mode-all-alpha
  • /test pull-kubernetes-node-e2e-cri-proxy-serial
  • /test pull-kubernetes-node-e2e-crio-cgrpv1-dra
  • /test pull-kubernetes-node-e2e-crio-cgrpv1-dra-kubetest2
  • /test pull-kubernetes-node-e2e-crio-cgrpv2-dra
  • /test pull-kubernetes-node-e2e-crio-cgrpv2-dra-kubetest2
  • /test pull-kubernetes-node-e2e-resource-health-status
  • /test pull-kubernetes-node-kubelet-containerd-flaky
  • /test pull-kubernetes-node-kubelet-credential-provider
  • /test pull-kubernetes-node-kubelet-serial-containerd
  • /test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
  • /test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv1-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-hugepages
  • /test pull-kubernetes-node-kubelet-serial-memory-manager
  • /test pull-kubernetes-node-kubelet-serial-podresize
  • /test pull-kubernetes-node-kubelet-serial-podresources
  • /test pull-kubernetes-node-kubelet-serial-topology-manager
  • /test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2
  • /test pull-kubernetes-node-swap-conformance-fedora-serial
  • /test pull-kubernetes-node-swap-conformance-ubuntu-serial
  • /test pull-kubernetes-node-swap-fedora
  • /test pull-kubernetes-node-swap-fedora-serial
  • /test pull-kubernetes-node-swap-ubuntu-serial
  • /test pull-kubernetes-scheduler-perf
  • /test pull-kubernetes-unit-experimental
  • /test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

  • pull-kubernetes-conformance-kind-ga-only-parallel
  • pull-kubernetes-conformance-kind-ipv6-parallel
  • pull-kubernetes-dependencies
  • pull-kubernetes-e2e-ec2
  • pull-kubernetes-e2e-ec2-conformance
  • pull-kubernetes-e2e-gce
  • pull-kubernetes-e2e-gce-canary
  • pull-kubernetes-e2e-gce-network-policies
  • pull-kubernetes-e2e-gci-gce-ingress
  • pull-kubernetes-e2e-gci-gce-ipvs
  • pull-kubernetes-e2e-kind
  • pull-kubernetes-e2e-kind-ipv6
  • pull-kubernetes-e2e-kind-ipvs
  • pull-kubernetes-e2e-kind-nftables
  • pull-kubernetes-integration
  • pull-kubernetes-linter-hints
  • pull-kubernetes-node-e2e-containerd
  • pull-kubernetes-typecheck
  • pull-kubernetes-unit
  • pull-kubernetes-verify

In response to this:

/test pull-kubernetes-verify-lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@esotsal
Copy link
Author

esotsal commented Nov 29, 2024

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 29, 2024
@esotsal
Copy link
Author

esotsal commented Dec 3, 2024

/retest

@pacoxu
Copy link
Member

pacoxu commented Dec 12, 2024

/test pull-kubernetes-e2e-gce-cos-alpha-features

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note-none Denotes a PR that doesn't merit a release note. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Needs Review
Development

Successfully merging this pull request may close these issues.