Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/ciliumidentity: skip CID creation for unmanaged pods #36779

Merged
merged 1 commit into from
Feb 12, 2025

Conversation

jshr-w
Copy link
Contributor

@jshr-w jshr-w commented Dec 24, 2024

Operator should only create CiliumIdentities for pods managed by Cilium, specifically here, skip creating CIDs for pods using host networking.

Fixes: #35402

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 24, 2024
@jshr-w jshr-w force-pushed the jshr/skipcidcreateunmanaged branch from c325f61 to f07171f Compare December 24, 2024 00:41
@jshr-w jshr-w force-pushed the jshr/skipcidcreateunmanaged branch 4 times, most recently from b4d0a8f to 7513aea Compare January 8, 2025 02:56
@jshr-w jshr-w marked this pull request as ready for review January 8, 2025 03:02
@jshr-w jshr-w requested a review from a team as a code owner January 8, 2025 03:02
@jshr-w jshr-w requested a review from tklauser January 8, 2025 03:02
@jshr-w
Copy link
Contributor Author

jshr-w commented Jan 8, 2025

cc @ovidiutirla

@ovidiutirla
Copy link
Contributor

/test

@ovidiutirla ovidiutirla requested a review from dlapcevic January 8, 2025 09:22
@dlapcevic
Copy link
Contributor

If we skip pods that don't have CEP/CES, then we lose the benefit of the CID controller creating CIDs as soon as pod objects are created.

After a deeper look, it turns out that the way Cilium handles unmanaged pods, is that it just tries to restart them to be able to start managing them -- https://github.com/cilium/cilium/blob/main/operator/cmd/k8s_pod_controller.go#L68.

We should verify if it's ok to create CIDs for all pods, because it's not expected to have any long running unmanaged pods. In that case we can close the issue without any code changes.

cc @squeed

@joestringer
Copy link
Member

Yeah I think it would help to have a bit more detail on the rationale for this change, for instance in which situation you are expecting there to be significant numbers of unmanaged Pods. Assuming we want to support both scenarios it may make sense to add a flag for this behavior.

@tklauser
Copy link
Member

Marking this PR as draft until more details are provided on the rationale for this change, see Joe's comment. This will keep the PR out of reviewer's queues. Please feel free to move out of draft again once the rationale is provided and reviews are solicited.

@tklauser tklauser marked this pull request as draft January 23, 2025 09:31
@jshr-w
Copy link
Contributor Author

jshr-w commented Jan 23, 2025

@ovidiutirla @dlapcevic
I don't have the full context for the rationale for this issue, so I want to verify with you both here.

I think the key question here was whether there are expected to be any long-lived unmanaged pods. My best understanding of the answer is that the expected long-lived unmanaged pods are hostNetwork pods, although I'm not sure if these would be created in a significant number. (Also, if this is the only case we care about, since other times, pods are restarted, it would be simpler to just skip hostNetwork pods rather than checking for a CEP/CES I think)

It seems like in the current set-up, if you create a host networking pod, it tries to create a CID (15 times), which doesn't seem optimal in cases with more host networking pods.

time=2025-01-23T23:23:18Z level=debug msg="Upsert internal mapping" module=operator.operator-controlplane.leader-lifecycle.k8s-cid-controller ciliumIdentityName=9421
time=2025-01-23T23:23:18Z level=info msg="CID allocated for pod" module=operator.operator-controlplane.leader-lifecycle.k8s-cid-controller k8sPodName=default/pod-worker ciliumIdentityName=9421 oldIdentity="" labels="map[app:pod-worker io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name:default io.cilium.k8s.policy.cluster:kind-kind io.cilium.k8s.policy.serviceaccount:default io.kubernetes.pod.namespace:default]"
time=2025-01-23T23:23:18Z level=info msg="Creating CID" module=operator.operator-controlplane.leader-lifecycle.k8s-cid-controller labels="map[k8s:app:pod-worker k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name:default k8s:io.cilium.k8s.policy.cluster:kind-kind k8s:io.cilium.k8s.policy.serviceaccount:default k8s:io.kubernetes.pod.namespace:default]" ciliumIdentityName=9421
time=2025-01-23T23:23:18Z level=warn msg="Failed to process resource item" module=operator.operator-controlplane.leader-lifecycle.k8s-cid-controller key=9421 retries=0 maxRetries=15 error="ciliumidentities.cilium.io is forbidden: User \"system:serviceaccount:kube-system:cilium-operator\" cannot create resource \"ciliumidentities\" in API group \"cilium.io\" at the cluster scope"

whereas with this change we would neatly skip out

time=2025-01-23T23:32:56Z level=debug msg="Skipping Pod event for unmanaged pod" module=operator.operator-controlplane.leader-lifecycle.k8s-cid-controller k8sPodName=default/pod-worker

Just wanted to make sure this seems reasonable to you all before making changes and get your thoughts on this before making any changes :)

Edit: Ah I realized I was missing a clusterrole, thanks Ovidiu :)
I think most of my question still stands, it seems that for a hostnetworking pod, a CID will be created in this mode (no errors 😅), though it is not when the agent creates identities. So it may make sense to focus on this type of unmanaged pod?

@jshr-w jshr-w force-pushed the jshr/skipcidcreateunmanaged branch from 7513aea to 3692a62 Compare January 29, 2025 00:48
@ovidiutirla
Copy link
Contributor

Would be good to clarify this aspect, if it's not relevant, let's close the corresponding issue and unblock the progress for making operator managing CIDs GA.

/cc @marseel @dlapcevic @bowei

When operator manages identities, ignore pod events when pod is not
managed by Cilium, in this case, using host networking.

Signed-off-by: jshr-w <shjayaraman@microsoft.com>
@jshr-w jshr-w force-pushed the jshr/skipcidcreateunmanaged branch from 3692a62 to 06cea2c Compare February 5, 2025 20:52
@jshr-w jshr-w marked this pull request as ready for review February 5, 2025 20:52
@tklauser
Copy link
Member

/test

@dlapcevic dlapcevic added the release-note/misc This PR makes changes that have no direct user impact. label Feb 12, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 12, 2025
@dlapcevic dlapcevic added this pull request to the merge queue Feb 12, 2025
Merged via the queue into cilium:main with commit a579093 Feb 12, 2025
65 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Operator should ignore unmanaged pods when generating CIDs
5 participants