-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-1967: promote size backed memory volumes to stable #126981
KEP-1967: promote size backed memory volumes to stable #126981
Conversation
/hold This is dependent on the KEP outcome. |
/triage accepted |
a36c6bb
to
de67d25
Compare
/retest |
1 similar comment
/retest |
de67d25
to
ee07a3f
Compare
IIUC A write to tmpfs creates dirty anonymous pages (can't be flushed to disk for reclaim) attributed to the container's cgroup. If that cgroup were to get torn down for any reason, those pages are still anonymous and dirty, so they get accounted by the next-parent cgroup (I think?). If this happens, you can imagine a situation where the container tries to restart and immediately OOMs because those tmpfs files are accounted to the pod. What I am asking is whether we have thought through those sorts of failure modes. Do we ever tear down the container cgroup? Are we confident that the accounting stays with the container? If you all say we are satisfied with the testing of that, I'm satisfied, but I wanted to call it out as an area where I know there are traps :) |
bdaebb4
to
d667b7c
Compare
pkg/features/kube_features.go
Outdated
// Enables kubelet support to size memory backed volumes | ||
SizeMemoryBackedVolumes featuregate.Feature = "SizeMemoryBackedVolumes" | ||
SizeMemoryBackedVolumes featuregate.Feature = "SizeMemoryBackedVolumes" // remove in 1.35 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please add a comment saying that this FG is only used in kubelet and not needed for emulated version so in couple releases it will be easier to understand that we can just remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do this tonight or tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@thockin So I don’t think we are tracking anything in the pod c group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
I think we resolved all the questions here.
LGTM label has been added. Git tree hash: c83d2e73530fa93baacae24f1164c091409c09a3
|
4eb7301
to
b690c4f
Compare
/retest |
@thockin I asked @ndixita her thoughts on the pod tracking. This is not a concern because we haven't been using placing tracking information into the pod cgroup.
I took this item because of its long outstanding status as a beta feature (quote is from you).
I am paraphrasing you here but this feature has been in beta on since 1.22. At this stage, I think any new issue related to emptyDir and tmpfs should be considered separate from this issue. This feature was about setting a size limit to node allocatable or pod limit. It didn't change anything around how the cgroups are tracked from container restarts. I think the feature works as intended and we have e2e tests that verify the limits. |
Yes. It looks like if a container hits an OOM limit with tmpfs than it will be stuck in an error state. I don't think this feature introduced this though. |
Even if we want to say that memory backed volume must have limit set and it must be lower than container limit, we will not be able to enforce it on API side without breaking existing Pods. So the only option would be for kubelet to override the memory-backed volume size limit to the container's limit (when defined). This may help. But it also will not guarantee we out of the "OOMLoopBackoff" as code bootstrap can be quite memory-heavy. I would see this as to be outside of the KEP scope, but it can be a good future enhancement. @thockin do you agree? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing the homework. I know how baroque some of the corner-cases are.
Thanks! /lgtm |
LGTM label has been added. Git tree hash: 475bc53422b98905d0504291c826b62bd334a5a3
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kannon92, SergeyKanzhelev, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Promote KEP-1967 to stable.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: