EventedPLEG: Update global cache timestamp more frequently #127954

hshiina · 2024-10-09T09:43:20Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

This fix makes the period to update the global cache timestamp for Evented PLEG shorter in order to unblock pod workers that get stuck in cache.GetNewerThan() as fast as Generic PLEG. Before updating the global timestamp, this fix calls Version() CRI API in order to confirm there is no problem with communicating with the runtime. By this, we can assume pods that have not been receiving events are in the latest status.

In order to verify this PR, run pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e and confirm the following failures don't appear:

E2eNode Suite: [It] [sig-node] [NodeFeature:SidecarContainers] Containers Lifecycle [It] should not hang in termination if terminated during initialization [sig-node, NodeFeature:SidecarContainers]
{ failed [FAILED] should delete in < 5 seconds, took 10.071795
Expected
    <float64>: 10.071794756
to be <
    <int64>: 5
In [It] at: k8s.io/kubernetes/test/e2e_node/container_lifecycle_test.go:3470 @ 09/10/24 08:54:31.819
}

E2eNode Suite.[It] [sig-node] [NodeConformance] Containers Lifecycle when a pod is terminating because its liveness probe fails should continue running liveness probes for restartable init containers and restart them while in preStop [NodeConformance]
{ failed [FAILED] Expected an error to have occurred.  Got:
    <nil>: nil
In [It] at: k8s.io/kubernetes/test/e2e_node/container_lifecycle_test.go:903 @ 08/23/24 18:16:25.131
}

Which issue(s) this PR fixes:

Fixes #124704, Fixes #127312

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2024-10-09T09:43:31Z

Hi @hshiina. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

adrianmoisey · 2024-10-09T11:07:38Z

pkg/kubelet/pleg/evented.go

@@ -150,6 +152,13 @@ func (e *EventedPLEG) Stop() {
 // called periodically to update the global timestamp of the cache so that those
 // pods stuck at GetNewerThan in pod workers will get unstuck.
 func (e *EventedPLEG) updateGlobalCache() {
+	// Confirm that there is no communication problem with the runtime.
+	if _, err := e.runtimeService.Version(context.TODO(), ""); err != nil {
+		e.logger.Error(err, "Evented PLEG: failed to communicate with the runtime before updaing the global cache")


Suggested change

e.logger.Error(err, "Evented PLEG: failed to communicate with the runtime before updaing the global cache")

e.logger.Error(err, "Evented PLEG: failed to communicate with the runtime before updating the global cache")

adrianmoisey · 2024-10-09T11:07:52Z

/ok-to-test

This fix makes the period to update the global cache timestamp for Evented PLEG shorter in order to unblock pod workers that get stuck in `cache.GetNewerThan()` as fast as Generic PLEG. Before updating the global timestamp, this fix calls Version() CRI API in order to confirm there is no problem with communicating with the runtime. By this, we can assume pods that have not been receiving events are in the latest status.

bart0sh · 2024-10-09T22:15:48Z

/retest

bart0sh · 2024-10-09T22:23:21Z

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2

haircommander · 2024-10-16T17:43:17Z

/triage accepted
/priority important-longterm

pacoxu · 2024-12-06T04:01:25Z

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/lgtm
/approve
/assign @smarterclayton @SergeyKanzhelev @yujuhong

k8s-ci-robot · 2024-12-06T04:01:32Z

LGTM label has been added.

Git tree hash: a2247d8552a30a39c85d06c962174ee7675cbe71

k8s-ci-robot · 2024-12-06T04:01:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hshiina, pacoxu
Once this PR has been reviewed and has the lgtm label, please assign klueska for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/kubelet/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pacoxu · 2024-12-06T04:03:10Z

/cc @harche @haircommander @ruiwen-zhao @yxxhero

yxxhero · 2024-12-06T04:14:18Z

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2

yxxhero · 2024-12-06T05:40:21Z

/lgtm

harche · 2024-12-09T12:39:09Z

pkg/kubelet/pleg/evented.go

-const globalCacheUpdatePeriod = 5 * time.Second
+// This is the same values as the relist period of Generic PLEG
+// in order to unblock pod workers as frequent as Generic PLEG.
+const globalCacheUpdatePeriod = 1 * time.Second


If Evented PLEG also relists every 1 second then it is as good as just having Generic PLEG right?

kubernetes/pkg/kubelet/kubelet.go

Line 193 in a499fac

genericPlegRelistPeriod = time.Second * 1

What is the advantage of using Evented PLEG in that case?

With this change, Evented PLEG just updates the global timestamp every one second while Generic PLEG goes through all containers to see if their states are changed. I think Evented PLEG is still lighter than Generic PLEG.

HirazawaUi · 2024-12-15T11:36:07Z

I think the fix proposed in this PR is not addressing the root cause of the issue. The problem arises after introducing the evented PLEG, where the podWorkerLoop and evented PLEG process events in parallel. Because Sync*Pod methods take a significant amount of time, the podWorkerLoop may updates lastSyncTime after the PLEG updates the container event. Consequently, the podWorkerLoop must wait for the evented PLEG to update the global cache before it can proceed. This is undoubtedly incorrect behavior.

To verify this, I injected some debugging logs into pod_workers.go and evented.go, then observed the logs by quickly creating and deleting a pod. The following changes were made:

diff --git a/pkg/kubelet/pod_workers.go b/pkg/kubelet/pod_workers.go
index 30850d830f6..40351e7f83d 100644
--- a/pkg/kubelet/pod_workers.go
+++ b/pkg/kubelet/pod_workers.go
@@ -1253,7 +1253,9 @@ func (p *podWorkers) podWorkerLoop(podUID types.UID, podUpdates <-chan struct{})
                                //  Improving this latency also reduces the possibility that a terminated
                                //  container's status is garbage collected before we have a chance to update the
                                //  API server (thus losing the exit code).
+                               klog.Info("---------debug-------: waiting new event")
                                status, err = p.podCache.GetNewerThan(update.Options.Pod.UID, lastSyncTime)
+                               klog.Info("---------debug-------: sync begining")

                                if err != nil {
                                        // This is the legacy event thrown by manage pod loop all other events are now dispatched
@@ -1267,6 +1269,7 @@ func (p *podWorkers) podWorkerLoop(podUID types.UID, podUpdates <-chan struct{})
                        switch {
                        case update.WorkType == TerminatedPod:
                                err = p.podSyncer.SyncTerminatedPod(ctx, update.Options.Pod, status)
+                               klog.Info("---------debug-------: sync terminated pod end")

                        case update.WorkType == TerminatingPod:
                                var gracePeriod *int64
@@ -1281,9 +1284,11 @@ func (p *podWorkers) podWorkerLoop(podUID types.UID, podUpdates <-chan struct{})
                                } else {
                                        err = p.podSyncer.SyncTerminatingPod(ctx, update.Options.Pod, status, gracePeriod, podStatusFn)
                                }
+                               klog.Info("---------debug-------: sync terminating pod end")

                        default:
                                isTerminal, err = p.podSyncer.SyncPod(ctx, update.Options.UpdateType, update.Options.Pod, update.Options.MirrorPod, status)
+                               klog.Info("---------debug-------: default sync end")
                        }

                        lastSyncTime = p.clock.Now()
---
diff --git a/pkg/kubelet/pleg/evented.go b/pkg/kubelet/pleg/evented.go
index 39e13b55223..b8c867b69be 100644
--- a/pkg/kubelet/pleg/evented.go
+++ b/pkg/kubelet/pleg/evented.go
@@ -285,6 +285,7 @@ func (e *EventedPLEG) processCRIEvent(event *runtimeapi.ContainerEventResponse)
        switch event.ContainerEventType {
        case runtimeapi.ContainerEventType_CONTAINER_STOPPED_EVENT:
                e.sendPodLifecycleEvent(&PodLifecycleEvent{ID: types.UID(event.PodSandboxStatus.Metadata.Uid), Type: ContainerDied, Data: event.ContainerId})
+               klog.Infof("---------debug-------: container event type: %s", ContainerDied)
                e.logger.V(4).Info("Received Container Stopped Event", "event", event.String())
        case runtimeapi.ContainerEventType_CONTAINER_CREATED_EVENT:
                // We only need to update the pod status on container create.
@@ -294,11 +295,14 @@ func (e *EventedPLEG) processCRIEvent(event *runtimeapi.ContainerEventResponse)
                // https://github.com/kubernetes/kubernetes/blob/24753aa8a4df8d10bfd6330e0f29186000c018be/pkg/kubelet/pleg/generic.go#L273
                e.logger.V(4).Info("Received Container Created Event", "event", event.String())
        case runtimeapi.ContainerEventType_CONTAINER_STARTED_EVENT:
+               klog.Infof("---------debug-------: container event type: %s", ContainerStarted)
                e.sendPodLifecycleEvent(&PodLifecycleEvent{ID: types.UID(event.PodSandboxStatus.Metadata.Uid), Type: ContainerStarted, Data: event.ContainerId})
                e.logger.V(4).Info("Received Container Started Event", "event", event.String())
        case runtimeapi.ContainerEventType_CONTAINER_DELETED_EVENT:
                // In case the pod is deleted it is safe to generate both ContainerDied and ContainerRemoved events, just like in the case of
                // Generic PLEG. https://github.com/kubernetes/kubernetes/blob/24753aa8a4df8d10bfd6330e0f29186000c018be/pkg/kubelet/pleg/generic.go#L169
+               klog.Infof("---------debug-------: container event type: %s", ContainerDied)
+               klog.Infof("---------debug-------: container event type: %s", ContainerRemoved)
                e.sendPodLifecycleEvent(&PodLifecycleEvent{ID: types.UID(event.PodSandboxStatus.Metadata.Uid), Type: ContainerDied, Data: event.ContainerId})
                e.sendPodLifecycleEvent(&PodLifecycleEvent{ID: types.UID(event.PodSandboxStatus.Metadata.Uid), Type: ContainerRemoved, Data: event.ContainerId})
                e.logger.V(4).Info("Received Container Deleted Event", "event", event)

The resulting logs show:

I1215 16:49:38.371176 4116072 pod_workers.go:1256] ---------debug-------: waiting new event
I1215 16:49:38.371184 4116072 pod_workers.go:1258] ---------debug-------: sync begining
I1215 16:49:38.742090 4116072 evented.go:299] ---------debug-------: container event type: ContainerStarted
I1215 16:49:38.775174 4116072 evented.go:299] ---------debug-------: container event type: ContainerStarted
I1215 16:49:38.775301 4116072 pod_workers.go:1291] ---------debug-------: default sync end
I1215 16:49:38.775313 4116072 pod_workers.go:1256] ---------debug-------: waiting new event
I1215 16:49:43.115686 4116072 pod_workers.go:1258] ---------debug-------: sync begining
I1215 16:49:43.116762 4116072 pod_workers.go:1291] ---------debug-------: default sync end
I1215 16:49:44.188646 4116072 pod_workers.go:1256] ---------debug-------: waiting new event
I1215 16:49:48.116031 4116072 pod_workers.go:1258] ---------debug-------: sync begining    // Start executing the Sync*Pod method
I1215 16:49:48.233793 4116072 evented.go:289] ---------debug-------: container event type: ContainerDied // evented PLEG update container event
I1215 16:49:48.256265 4116072 evented.go:289] ---------debug-------: container event type: ContainerDied // evented PLEG update container event
I1215 16:49:48.430516 4116072 pod_workers.go:1287] ---------debug-------: sync terminating pod end // End execution of Sync*Pod methods and update lastSyncTime
I1215 16:49:48.430537 4116072 pod_workers.go:1256] ---------debug-------: waiting new event
I1215 16:49:53.116925 4116072 pod_workers.go:1258] ---------debug-------: sync begining
I1215 16:49:53.117309 4116072 pod_workers.go:1272] ---------debug-------: sync terminated pod end

From the logs, it is evident that PLEG updates container events while Sync*Pod is still in progress, and before lastSyncTime is updated.

Although the current PR mitigates the issue by allowing podWorkerLoop to work more frequently when evented PLEG is enabled, it masks the bug introduced with evented PLEG and might cause additional issues. Therefore, I believe the fix is not addressing the root cause.

If I am mistaken, please let me know.

hshiina · 2024-12-15T16:23:13Z

As being pointed out, this PR is not a fundamental fix. Ideally, I think we should solve these TODOs:

kubernetes/pkg/kubelet/kubelet.go

Lines 2100 to 2101 in cb93d6e

    
           // TODO: have KillPod return the terminal status of stopped containers and write that into the 
        
           //  cache immediately

kubernetes/pkg/kubelet/pod_workers.go

Lines 1249 to 1252 in cb93d6e

    
           // TODO: this adds ~1s of latency on all transitions from sync to terminating 
        
           //  to terminated, and on all termination retries (including evictions). We should 
        
           //  improve latency by making the pleg continuous and by allowing pod status 
        
           //  changes to be refreshed when key events happen (killPod, sync->terminating).

HirazawaUi · 2024-12-16T00:21:58Z

As being pointed out, this PR is not a fundamental fix. Ideally, I think we should solve these TODOs:

EventedPLEG is currently an alpha feature, which means it has been operating in an incorrect manner. What we may need is not a temporary fix but ensuring it functions as expected before it progresses to the beta stage. If possible, we should also consider revising the content of the KEP retrospectively.

pacoxu · 2024-12-30T03:39:37Z

As being pointed out, this PR is not a fundamental fix. Ideally, I think we should solve these TODOs.

Agree. And can we merge this first as this is a safe improvement on Evented PLEG?

Will increasing the frequency make existing issues more prominent? @hshiina

HirazawaUi · 2025-01-08T00:15:12Z

/test pull-kubernetes-e2e-kind-evented-pleg

HirazawaUi · 2025-01-16T11:59:29Z

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2

k8s-ci-robot · 2025-01-16T12:48:09Z

@hshiina: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2	`13edfa4`	link	false	`/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2`
pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e	`13edfa4`	link	false	`/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

hshiina · 2025-01-18T15:11:04Z

/close
#129355 will fix this issue.

k8s-ci-robot · 2025-01-18T15:11:09Z

@hshiina: Closed this PR.

In response to this:

/close
#129355 will fix this issue.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 9, 2024

k8s-ci-robot requested review from andrewsykim and derekwaynecarr October 9, 2024 09:43

adrianmoisey reviewed Oct 9, 2024

View reviewed changes

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 9, 2024

hshiina force-pushed the evented-pleg-keepalive branch from bbb5c2a to 13edfa4 Compare October 9, 2024 15:03

k8s-ci-robot assigned SergeyKanzhelev and smarterclayton Dec 6, 2024

k8s-ci-robot assigned yujuhong and pacoxu Dec 6, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 6, 2024

k8s-ci-robot requested review from haircommander, harche, ruiwen-zhao and yxxhero December 6, 2024 04:03

k8s-ci-robot assigned yxxhero Dec 6, 2024

pacoxu mentioned this pull request Dec 6, 2024

Fix evented pleg mirror pod & use IsEventedPLEGInUse instead of FG status check #122778

Open

harche reviewed Dec 9, 2024

View reviewed changes

hshiina mentioned this pull request Dec 27, 2024

[evented pleg]: using real-time container events for pod state determination #129355

Open

k8s-ci-robot closed this Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EventedPLEG: Update global cache timestamp more frequently #127954

EventedPLEG: Update global cache timestamp more frequently #127954

hshiina commented Oct 9, 2024

k8s-ci-robot commented Oct 9, 2024

adrianmoisey Oct 9, 2024

adrianmoisey commented Oct 9, 2024

bart0sh commented Oct 9, 2024

bart0sh commented Oct 9, 2024

haircommander commented Oct 16, 2024

pacoxu commented Dec 6, 2024

k8s-ci-robot commented Dec 6, 2024

k8s-ci-robot commented Dec 6, 2024

pacoxu commented Dec 6, 2024

yxxhero commented Dec 6, 2024

yxxhero commented Dec 6, 2024

harche Dec 9, 2024

hshiina Dec 9, 2024

HirazawaUi commented Dec 15, 2024

hshiina commented Dec 15, 2024

HirazawaUi commented Dec 16, 2024

pacoxu commented Dec 30, 2024 •

edited

Loading

HirazawaUi commented Jan 8, 2025

HirazawaUi commented Jan 16, 2025

k8s-ci-robot commented Jan 16, 2025

hshiina commented Jan 18, 2025

k8s-ci-robot commented Jan 18, 2025

	e.logger.Error(err, "Evented PLEG: failed to communicate with the runtime before updaing the global cache")
	e.logger.Error(err, "Evented PLEG: failed to communicate with the runtime before updating the global cache")

EventedPLEG: Update global cache timestamp more frequently #127954

EventedPLEG: Update global cache timestamp more frequently #127954

Conversation

hshiina commented Oct 9, 2024

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Oct 9, 2024

adrianmoisey Oct 9, 2024

Choose a reason for hiding this comment

adrianmoisey commented Oct 9, 2024

bart0sh commented Oct 9, 2024

bart0sh commented Oct 9, 2024

haircommander commented Oct 16, 2024

pacoxu commented Dec 6, 2024

k8s-ci-robot commented Dec 6, 2024

k8s-ci-robot commented Dec 6, 2024

pacoxu commented Dec 6, 2024

yxxhero commented Dec 6, 2024

yxxhero commented Dec 6, 2024

harche Dec 9, 2024

Choose a reason for hiding this comment

hshiina Dec 9, 2024

Choose a reason for hiding this comment

HirazawaUi commented Dec 15, 2024

hshiina commented Dec 15, 2024

HirazawaUi commented Dec 16, 2024

pacoxu commented Dec 30, 2024 • edited Loading

HirazawaUi commented Jan 8, 2025

HirazawaUi commented Jan 16, 2025

k8s-ci-robot commented Jan 16, 2025

hshiina commented Jan 18, 2025

k8s-ci-robot commented Jan 18, 2025

pacoxu commented Dec 30, 2024 •

edited

Loading