-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet: get cgroup driver config from CRI #118770
kubelet: get cgroup driver config from CRI #118770
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is an early version of the PR, especially targeted for bike-shedding on the CRI API naming. It also contains a full implementation (sans unit tests) to see how the new CRI rpc would be used |
Changelog suggestion for when this is ready: Added automatic detection of cgroup driver for Linux nodes.
With the `KubeletCgroupDriverFromCRI` feature gate enabled, and sufficiently new version of a container
runtime, kubelet automatically detects the cgroup driver configuration from the container runtime,
eliminating the need to specify the `cgroupDriver` configuration option (nor the deprecated `--cgroup-driver`
command line argument) for the kubelet. I mentioned Linux because Windows nodes don't have cgroups. |
cmd/kubelet/app/server.go
Outdated
return err | ||
} | ||
// CRI implementation doesn't support RuntimeConfig, fallback | ||
klog.InfoS("CRI implementation should be updated to support RuntimeConfig when KubeletCgroupDriverFromCRI feature gate has been enabled. Falling back to using cgroupDriver from kubelet config.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For beta, maybe we can even record an Event into the API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good for the user. The proposal now states that for beta Drop fallback to old behavior. CRI implementations expected to have support. We can of course change that, or then add an event for alpha already 🧐
if utilfeature.DefaultFeatureGate.Enabled(features.KubeletCgroupDriverFromCRI) { | ||
if err := getCgroupDriverFromCRI(ctx, s, kubeDeps); err != nil { | ||
return err | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we do this:
- per new connection to the socket?
- per kubelet startup?
I'm thinking that if someone restarts the process at the other end of the socket, we should re-query this setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens only at kubelet startup.
Technically we could query it at CRI reconnection but I wouldn't go there.
- The cgroup driver is really not something that's expected to be changed without taking the node down. Changing it correctly requires restarting all pods, also the static ones. System reboot is a sage method.
- I don't know what all would break in kubelet (internally) if we changed the cgroup-driver in-flight
We could re-query the cgroup driver at re-connection and exit if the cgroup driver setting has changed. But I'm not sure if that's desired. Thoughts @SergeyKanzhelev @mrunalp @haircommander
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our efforts would be better spent making it very clear a change in the cgroup driver in the CRI requires a node reboot
} | ||
// CRI implementation doesn't support RuntimeConfig, fallback | ||
klog.InfoS("CRI implementation should be updated to support RuntimeConfig when KubeletCgroupDriverFromCRI feature gate has been enabled. Falling back to using cgroupDriver from kubelet config.") | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can there be a comment saying where the cgroup driver is coming from/set by if this case happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It now states Falling back to using cgroupDriver from kubelet config. Do you suggest something else/in addition? @haircommander
case runtimeapi.CgroupDriver_CGROUPFS: | ||
s.CgroupDriver = "cgroupfs" | ||
default: | ||
return fmt.Errorf("runtime returned an unknown cgroup driver %d", d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what the kubelet's policy on panics are, but this seems like a prime reason to panic as it's a programming error that should never happen. I'm not stubborn about that though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, It's a programming, but on the CRI runtime's side, out of our (kubelet's) control so I'm not sure we want to panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's fall back to the configuration file for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more... this should be treated differently than if runtime is not available, for example, or returned error, going forward. I think if runtime returned error we will need to retry a few times and fallback to config. If it's a new unknown value, kubelet should not retry and should exit immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, is this what you were thinking @SergeyKanzhelev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrunalp I added it bc of @SergeyKanzhelev 's comment. I am personally fine without it, i'll let y'all decide :)
@haircommander @mrunalp @mikebrow @sftim any thoughts on the CRI API naming? The main addition is now rpc RuntimeConfig(RuntimeConfigRequest) returns (RuntimeConfigResponse) We have But then we have also rpc UpdateRuntimeConfig(UpdateRuntimeConfigRequest) returns (UpdateRuntimeConfigResponse) {}
message UpdateRuntimeConfigRequest {
RuntimeConfig runtime_config = 1;
} So the naming is easily clashing... |
yeah I guess An issue though is this runtime config has nothing to do with UpdateRuntimeConfig 🙃 |
naming is hard... over on the status api we currently have the ability to provide any desired runtime config response contents ... at least for prototyping just need to agree on the nvp graph/map esp. with status verbose content... see crictl "info" response
|
The original idea for UpdateRuntimeConfig was to add more optional structs as we added more runtime configuration options for kubelet.. Maybe we don't need new apis so much as we need to add fields to existing update structs and consider some constants/graph/map format for status info. |
not within scope here: something we should think about long term is the direction of data. UpdateRuntimeConfig tells cri to update (kubelet->cri). RuntimeConfig is kubelet asking CRI what to do (cri->kubelet). as time goes on, we probably want to choose one direction of information |
after discussion with @haircommander perhaps the new runtime service cri api should be StaticRuntimeConfig to cover retrieving the cgroup driver for now and we can add more content that won't change, without a reboot of the container runtime as needed... |
4749ca6
to
b8f1cb4
Compare
This patch refactors the kubelet startup code to initialize the runtime service earlier in the startup sequence. We want this to be able to query the cgroup driver setting from the CRI befure initializing the cgroup manager.
This patch modifies kubelet to get the cgroup driver setting from the CRI runtime using the newly added RuntimeConfig rpc. The new code path only takes place if the KubeletCgroupDriverFromCRI feature gate is enabled. If the runtime returns a not-implemented error kubelet falls back to using the cgroupDriver configuration option, with a log message instructing the user to upgrade to w newer container runtime. Other rpc errors cause kubelet to exit as is the case if the runtime returns an unknown cgroup driver.
Signed-off-by: Peter Hunt <pehunt@redhat.com>
c83aa53
to
d79d775
Compare
Signed-off-by: Peter Hunt <pehunt@redhat.com>
d79d775
to
bfa62e0
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
A few notes:
- The call to
PreInitRuntimeService
was moved to earlier in the flow unconditionally. This change the behavior when feature gate is not set. Looking at what is being done between the new callsite and the old callsite - the risk that it will break something is very low. - Changelog file for CRI API will need to be updated. Close to code freeze it's best to do as a follow up as there is another PR updating it now.
LGTM label has been added. Git tree hash: 51a623a1e04224820eee47f0c4c660229b07713b
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: marquiz, mikebrow, mrunalp, SergeyKanzhelev The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@marquiz is the release note still WIP or can we remove that prefix? |
I removed the wip prefix |
Thank you @marquiz! 🙏 |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR implements kubelet cgroup driver detection over the CRI API, implemented in multiple patches:
This patch adds a new rpc to the runtime service to query CRI runtime
configuration options. For now, it only contains one field for getting
the cgroup driver (systemd or cgroupfs) to be used.
Also update the CRI RuntimeService inteface to include the new
RuntimeConfig rpc.
This patch refactors the kubelet startup code to initialize the runtime
service earlier in the startup sequence. We want this to be able to
query the cgroup driver setting from the CRI befure initializing the
cgroup manager.
This patch modifies kubelet to get the cgroup driver setting from the
CRI runtime using the newly added RuntimeConfig rpc. The new code path
only takes place if the KubeletCgroupDriverFromCRI feature gate is
enabled. If the runtime returns a not-implemented error kubelet falls
back to using the cgroupDriver configuration option, with a log message
instructing the user to upgrade to w newer container runtime. Other rpc
errors cause kubelet to exit as is the case if the runtime returns an
unknown cgroup driver.
Which issue(s) this PR fixes:
Refs kubernetes/enhancements#4033
Special notes for your reviewer:
Still work in progress: most importantly unit tests are not included.
If so desired, the changes to cri-api could be submitted as a separate PR.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: