-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not clear state of pods pending admission for CPU/Memory/Device manager #103979
Merged
k8s-ci-robot
merged 3 commits into
kubernetes:master
from
cynepco3hahue:save_admitted_pods
Aug 9, 2021
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is only partially correct. You need to prune the admitted pods map to only include those that are still in either config or pod worker but not in both (a pod that is in the pod worker is by definition admitted). You also want to exclude pods that are no longer running due to termination (which means we will never start the container again, thus we can reuse those CPUs). You then need to take that pruned list of admitted pods and add it to active pods to get the list of "pods that should be considered to be allocated guaranteed resources"
I think we'll need to add a method on kubelet to make this easier - looking at what that would be right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this, I think we have a bug in that these components need to know about pods that are terminating that have been force deleted (which activePods() does not include). @Random-Liu I think that components like the cpu_manager, memory_manager, qos_container_manager, and device_manager actually need to know pods that have been force deleted but are still running. To do that, I'm going to need to enable the pod worker to share that info safely (and consistently, i think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you suggest here is necessary. As you mentioned in your comment here, all we should really need to care about is whether the pod is:
(1) in the active pods list; or
(2) some pod currently waiting to be admitted
Assuming logic exists to atomically add an admitted pod to the active pod list before starting the next iteration of the admission loop, I believe this should be sufficient. If it doesn't make it into the active list, then we shouldn't be tracking it anyway, and if it's ever removed from the list, we should be done tracking it.
We just then need to track a single variable to hold the "currently being admitting" pod that gets overwritten each time a new iteration of the admit loop starts.
The check in
removeStaleState()
would then remove any state not currently associated with a pod in the active list + this new variable.Of course this is all contingent on what I said before:
Maybe there's good reason not to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bug is that the active pods list does not include force deleted pods that may take tens of minutes to complete that may still have running containers that are pinned to certain CPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a problem though? I would think we can safely remove the CPUs assigned to them then, and the next time around the reconcile loop, their containers will be moved onto non-exclusive CPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU/CT this is the most pressing question now. I fully agree about the general idea to restore the old (and, we learned, not covering some important cases) behaviour and iterating later to actually cover them. The former is important and urgent, the later important as well but less urgent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ActivePods() never returned pods that were force deleted. So you have always been broken on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, can we restore at least the old behavior under the current PR, and once we will have the infrastructure to get force deleted pods we can improve the logic under resource managers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened an issue to monitor force deleted pods problem - #104099
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this does not block this change, and probably the fix for 104099 should make the code changes here simpler (but we don't have to do it now).