Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EventedPLEG: Update global cache timestamp more frequently #127954

Closed
wants to merge 1 commit into from

Conversation

hshiina
Copy link
Contributor

@hshiina hshiina commented Oct 9, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

This fix makes the period to update the global cache timestamp for Evented PLEG shorter in order to unblock pod workers that get stuck in cache.GetNewerThan() as fast as Generic PLEG. Before updating the global timestamp, this fix calls Version() CRI API in order to confirm there is no problem with communicating with the runtime. By this, we can assume pods that have not been receiving events are in the latest status.

In order to verify this PR, run pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e and confirm the following failures don't appear:

E2eNode Suite: [It] [sig-node] [NodeFeature:SidecarContainers] Containers Lifecycle [It] should not hang in termination if terminated during initialization [sig-node, NodeFeature:SidecarContainers]
{ failed [FAILED] should delete in < 5 seconds, took 10.071795
Expected
    <float64>: 10.071794756
to be <
    <int64>: 5
In [It] at: k8s.io/kubernetes/test/e2e_node/container_lifecycle_test.go:3470 @ 09/10/24 08:54:31.819
}

E2eNode Suite.[It] [sig-node] [NodeConformance] Containers Lifecycle when a pod is terminating because its liveness probe fails should continue running liveness probes for restartable init containers and restart them while in preStop [NodeConformance]
{ failed [FAILED] Expected an error to have occurred.  Got:
    <nil>: nil
In [It] at: k8s.io/kubernetes/test/e2e_node/container_lifecycle_test.go:903 @ 08/23/24 18:16:25.131
}

Which issue(s) this PR fixes:

Fixes #124704, Fixes #127312

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 9, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @hshiina. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 9, 2024
@@ -150,6 +152,13 @@ func (e *EventedPLEG) Stop() {
// called periodically to update the global timestamp of the cache so that those
// pods stuck at GetNewerThan in pod workers will get unstuck.
func (e *EventedPLEG) updateGlobalCache() {
// Confirm that there is no communication problem with the runtime.
if _, err := e.runtimeService.Version(context.TODO(), ""); err != nil {
e.logger.Error(err, "Evented PLEG: failed to communicate with the runtime before updaing the global cache")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
e.logger.Error(err, "Evented PLEG: failed to communicate with the runtime before updaing the global cache")
e.logger.Error(err, "Evented PLEG: failed to communicate with the runtime before updating the global cache")

@adrianmoisey
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 9, 2024
This fix makes the period to update the global cache timestamp for
Evented PLEG shorter in order to unblock pod workers that get stuck
in `cache.GetNewerThan()` as fast as Generic PLEG. Before updating the
global timestamp, this fix calls Version() CRI API in order to confirm
there is no problem with communicating with the runtime. By this, we
can assume pods that have not been receiving events are in the latest
status.
@hshiina hshiina force-pushed the evented-pleg-keepalive branch from bbb5c2a to 13edfa4 Compare October 9, 2024 15:03
@bart0sh
Copy link
Contributor

bart0sh commented Oct 9, 2024

/retest

@bart0sh
Copy link
Contributor

bart0sh commented Oct 9, 2024

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2

@haircommander
Copy link
Contributor

/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 16, 2024
@pacoxu
Copy link
Member

pacoxu commented Dec 6, 2024

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/lgtm
/approve
/assign @smarterclayton @SergeyKanzhelev @yujuhong

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 6, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a2247d8552a30a39c85d06c962174ee7675cbe71

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hshiina, pacoxu
Once this PR has been reviewed and has the lgtm label, please assign klueska for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pacoxu
Copy link
Member

pacoxu commented Dec 6, 2024

@yxxhero
Copy link
Member

yxxhero commented Dec 6, 2024

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2

@yxxhero
Copy link
Member

yxxhero commented Dec 6, 2024

/lgtm

const globalCacheUpdatePeriod = 5 * time.Second
// This is the same values as the relist period of Generic PLEG
// in order to unblock pod workers as frequent as Generic PLEG.
const globalCacheUpdatePeriod = 1 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Evented PLEG also relists every 1 second then it is as good as just having Generic PLEG right?

genericPlegRelistPeriod = time.Second * 1

What is the advantage of using Evented PLEG in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, Evented PLEG just updates the global timestamp every one second while Generic PLEG goes through all containers to see if their states are changed. I think Evented PLEG is still lighter than Generic PLEG.

@HirazawaUi
Copy link
Contributor

I think the fix proposed in this PR is not addressing the root cause of the issue. The problem arises after introducing the evented PLEG, where the podWorkerLoop and evented PLEG process events in parallel. Because Sync*Pod methods take a significant amount of time, the podWorkerLoop may updates lastSyncTime after the PLEG updates the container event. Consequently, the podWorkerLoop must wait for the evented PLEG to update the global cache before it can proceed. This is undoubtedly incorrect behavior.

To verify this, I injected some debugging logs into pod_workers.go and evented.go, then observed the logs by quickly creating and deleting a pod. The following changes were made:

diff --git a/pkg/kubelet/pod_workers.go b/pkg/kubelet/pod_workers.go
index 30850d830f6..40351e7f83d 100644
--- a/pkg/kubelet/pod_workers.go
+++ b/pkg/kubelet/pod_workers.go
@@ -1253,7 +1253,9 @@ func (p *podWorkers) podWorkerLoop(podUID types.UID, podUpdates <-chan struct{})
                                //  Improving this latency also reduces the possibility that a terminated
                                //  container's status is garbage collected before we have a chance to update the
                                //  API server (thus losing the exit code).
+                               klog.Info("---------debug-------: waiting new event")
                                status, err = p.podCache.GetNewerThan(update.Options.Pod.UID, lastSyncTime)
+                               klog.Info("---------debug-------: sync begining")

                                if err != nil {
                                        // This is the legacy event thrown by manage pod loop all other events are now dispatched
@@ -1267,6 +1269,7 @@ func (p *podWorkers) podWorkerLoop(podUID types.UID, podUpdates <-chan struct{})
                        switch {
                        case update.WorkType == TerminatedPod:
                                err = p.podSyncer.SyncTerminatedPod(ctx, update.Options.Pod, status)
+                               klog.Info("---------debug-------: sync terminated pod end")

                        case update.WorkType == TerminatingPod:
                                var gracePeriod *int64
@@ -1281,9 +1284,11 @@ func (p *podWorkers) podWorkerLoop(podUID types.UID, podUpdates <-chan struct{})
                                } else {
                                        err = p.podSyncer.SyncTerminatingPod(ctx, update.Options.Pod, status, gracePeriod, podStatusFn)
                                }
+                               klog.Info("---------debug-------: sync terminating pod end")

                        default:
                                isTerminal, err = p.podSyncer.SyncPod(ctx, update.Options.UpdateType, update.Options.Pod, update.Options.MirrorPod, status)
+                               klog.Info("---------debug-------: default sync end")
                        }

                        lastSyncTime = p.clock.Now()
---
diff --git a/pkg/kubelet/pleg/evented.go b/pkg/kubelet/pleg/evented.go
index 39e13b55223..b8c867b69be 100644
--- a/pkg/kubelet/pleg/evented.go
+++ b/pkg/kubelet/pleg/evented.go
@@ -285,6 +285,7 @@ func (e *EventedPLEG) processCRIEvent(event *runtimeapi.ContainerEventResponse)
        switch event.ContainerEventType {
        case runtimeapi.ContainerEventType_CONTAINER_STOPPED_EVENT:
                e.sendPodLifecycleEvent(&PodLifecycleEvent{ID: types.UID(event.PodSandboxStatus.Metadata.Uid), Type: ContainerDied, Data: event.ContainerId})
+               klog.Infof("---------debug-------: container event type: %s", ContainerDied)
                e.logger.V(4).Info("Received Container Stopped Event", "event", event.String())
        case runtimeapi.ContainerEventType_CONTAINER_CREATED_EVENT:
                // We only need to update the pod status on container create.
@@ -294,11 +295,14 @@ func (e *EventedPLEG) processCRIEvent(event *runtimeapi.ContainerEventResponse)
                // https://github.com/kubernetes/kubernetes/blob/24753aa8a4df8d10bfd6330e0f29186000c018be/pkg/kubelet/pleg/generic.go#L273
                e.logger.V(4).Info("Received Container Created Event", "event", event.String())
        case runtimeapi.ContainerEventType_CONTAINER_STARTED_EVENT:
+               klog.Infof("---------debug-------: container event type: %s", ContainerStarted)
                e.sendPodLifecycleEvent(&PodLifecycleEvent{ID: types.UID(event.PodSandboxStatus.Metadata.Uid), Type: ContainerStarted, Data: event.ContainerId})
                e.logger.V(4).Info("Received Container Started Event", "event", event.String())
        case runtimeapi.ContainerEventType_CONTAINER_DELETED_EVENT:
                // In case the pod is deleted it is safe to generate both ContainerDied and ContainerRemoved events, just like in the case of
                // Generic PLEG. https://github.com/kubernetes/kubernetes/blob/24753aa8a4df8d10bfd6330e0f29186000c018be/pkg/kubelet/pleg/generic.go#L169
+               klog.Infof("---------debug-------: container event type: %s", ContainerDied)
+               klog.Infof("---------debug-------: container event type: %s", ContainerRemoved)
                e.sendPodLifecycleEvent(&PodLifecycleEvent{ID: types.UID(event.PodSandboxStatus.Metadata.Uid), Type: ContainerDied, Data: event.ContainerId})
                e.sendPodLifecycleEvent(&PodLifecycleEvent{ID: types.UID(event.PodSandboxStatus.Metadata.Uid), Type: ContainerRemoved, Data: event.ContainerId})
                e.logger.V(4).Info("Received Container Deleted Event", "event", event)

The resulting logs show:

I1215 16:49:38.371176 4116072 pod_workers.go:1256] ---------debug-------: waiting new event
I1215 16:49:38.371184 4116072 pod_workers.go:1258] ---------debug-------: sync begining
I1215 16:49:38.742090 4116072 evented.go:299] ---------debug-------: container event type: ContainerStarted
I1215 16:49:38.775174 4116072 evented.go:299] ---------debug-------: container event type: ContainerStarted
I1215 16:49:38.775301 4116072 pod_workers.go:1291] ---------debug-------: default sync end
I1215 16:49:38.775313 4116072 pod_workers.go:1256] ---------debug-------: waiting new event
I1215 16:49:43.115686 4116072 pod_workers.go:1258] ---------debug-------: sync begining
I1215 16:49:43.116762 4116072 pod_workers.go:1291] ---------debug-------: default sync end
I1215 16:49:44.188646 4116072 pod_workers.go:1256] ---------debug-------: waiting new event
I1215 16:49:48.116031 4116072 pod_workers.go:1258] ---------debug-------: sync begining    // Start executing the Sync*Pod method
I1215 16:49:48.233793 4116072 evented.go:289] ---------debug-------: container event type: ContainerDied // evented PLEG update container event
I1215 16:49:48.256265 4116072 evented.go:289] ---------debug-------: container event type: ContainerDied // evented PLEG update container event
I1215 16:49:48.430516 4116072 pod_workers.go:1287] ---------debug-------: sync terminating pod end // End execution of Sync*Pod methods and update lastSyncTime
I1215 16:49:48.430537 4116072 pod_workers.go:1256] ---------debug-------: waiting new event
I1215 16:49:53.116925 4116072 pod_workers.go:1258] ---------debug-------: sync begining
I1215 16:49:53.117309 4116072 pod_workers.go:1272] ---------debug-------: sync terminated pod end

From the logs, it is evident that PLEG updates container events while Sync*Pod is still in progress, and before lastSyncTime is updated.

Although the current PR mitigates the issue by allowing podWorkerLoop to work more frequently when evented PLEG is enabled, it masks the bug introduced with evented PLEG and might cause additional issues. Therefore, I believe the fix is not addressing the root cause.

If I am mistaken, please let me know.

@hshiina
Copy link
Contributor Author

hshiina commented Dec 15, 2024

As being pointed out, this PR is not a fundamental fix. Ideally, I think we should solve these TODOs:

// TODO: have KillPod return the terminal status of stopped containers and write that into the
// cache immediately

// TODO: this adds ~1s of latency on all transitions from sync to terminating
// to terminated, and on all termination retries (including evictions). We should
// improve latency by making the pleg continuous and by allowing pod status
// changes to be refreshed when key events happen (killPod, sync->terminating).

@HirazawaUi
Copy link
Contributor

As being pointed out, this PR is not a fundamental fix. Ideally, I think we should solve these TODOs:

EventedPLEG is currently an alpha feature, which means it has been operating in an incorrect manner. What we may need is not a temporary fix but ensuring it functions as expected before it progresses to the beta stage. If possible, we should also consider revising the content of the KEP retrospectively.

@pacoxu
Copy link
Member

pacoxu commented Dec 30, 2024

As being pointed out, this PR is not a fundamental fix. Ideally, I think we should solve these TODOs.

Agree. And can we merge this first as this is a safe improvement on Evented PLEG?

  • Will increasing the frequency make existing issues more prominent? @hshiina

@HirazawaUi
Copy link
Contributor

/test pull-kubernetes-e2e-kind-evented-pleg

@HirazawaUi
Copy link
Contributor

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2

@k8s-ci-robot
Copy link
Contributor

@hshiina: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2 13edfa4 link false /test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2
pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e 13edfa4 link false /test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hshiina
Copy link
Contributor Author

hshiina commented Jan 18, 2025

/close
#129355 will fix this issue.

@k8s-ci-robot
Copy link
Contributor

@hshiina: Closed this PR.

In response to this:

/close
#129355 will fix this issue.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.