-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kubelet] Fix race condition in container manager #48123
[Kubelet] Fix race condition in container manager #48123
Conversation
/test pull-kubernetes-node-e2e |
/assign @dchen1107 |
The difference between this change and #48096 is that this change only touches the codepath for the storage resources, and doesn't try to refactor the existing resources to use the initial capacity values. I'll leave it up to someone more familiar with the node code to refactor the rest. |
cc @marun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall with some comments.
The difference between this change and #48096 is that this change only touches the codepath for the storage resources, and doesn't try to refactor the existing resources to use the initial capacity values. I'll leave it up to someone more familiar with the node code to refactor the rest.
Could you file an issue and leave a comment there explaining that? I think it's good to limit the scope of change in this PR since we're so close to the release, but using different capacity source in the node status function is quite puzzling. Adding a TODO to fix this would be good.
} | ||
} | ||
|
||
if hasDedicatedImageFs, _ := cadvisorInterface.HasDedicatedImageFs(); hasDedicatedImageFs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it a bit strange that with this feature, the image fs will longer be dedicated even though we call it that....not really pertinent to this PR though.
@@ -225,3 +225,16 @@ func (cc *cadvisorClient) getFsInfo(label string) (cadvisorapiv2.FsInfo, error) | |||
func (cc *cadvisorClient) WatchEvents(request *events.Request) (*events.EventChannel, error) { | |||
return cc.WatchForEvents(request) | |||
} | |||
|
|||
// HasDedicatedImageFs returns true if the imagefs has a dedicated device. | |||
func (cc *cadvisorClient) HasDedicatedImageFs() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with this for now, but for the record, this would need to move back to a higher level package (again) once we let the runtime report imagefs info through CRI.
pkg/kubelet/cm/container_manager.go
Outdated
@@ -60,6 +60,9 @@ type ContainerManager interface { | |||
// GetNodeAllocatable returns the amount of compute resources that have to be reserved from scheduling. | |||
GetNodeAllocatableReservation() v1.ResourceList | |||
|
|||
// GetNodeCapacity returns the amount of compute resources tracked by container manager available on the node. | |||
GetCapacity() v1.ResourceList | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why the "container manager" is responsible for supplying all the information. If that's the case, maybe we can move HasDedicatedImageFs
to here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that currently in container manager initialization, it scans the system resources and keeps a cache of the capacity.
As for HasDedicatedImageFs
, it seems like the only place it is called in during container manager initialization. Would it make more sense for it to just be a private function then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried removing the function from cadvisor.Interface, but it looks like it is required in the eviction.DiskInfoProvider interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's used by the eviction manager. Moving the function requires passing the container manager to the eviction manager too.
This is ok for now.
pkg/kubelet/cm/container_manager.go
Outdated
@@ -60,6 +60,9 @@ type ContainerManager interface { | |||
// GetNodeAllocatable returns the amount of compute resources that have to be reserved from scheduling. | |||
GetNodeAllocatableReservation() v1.ResourceList | |||
|
|||
// GetNodeCapacity returns the amount of compute resources tracked by container manager available on the node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/GetNodeCapacity/GetCapacity
pkg/kubelet/kubelet_cadvisor.go
Outdated
@@ -23,6 +23,8 @@ import ( | |||
kubecontainer "k8s.io/kubernetes/pkg/kubelet/container" | |||
) | |||
|
|||
// TODO: Move all these methods into the cAdvisor package to make kubelet package a consumer of libraries rather than a source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the TODO. Container stats needs a major refactor to support CRI anyway. Don't think a one-off TODO comment will help.
I added/removed some comments based on the review. PTAL @yujuhong |
/approve Will LGTM after commits are squashed |
Centralize Capacity discovery of standard resources in Container manager. Have storage derive node capacity from container manager. Move certain cAdvisor interfaces to the cAdvisor package in the process. This patch fixes a bug in container manager where it was writing to a map without synchronization. Signed-off-by: Vishnu kannan <vishnuk@google.com>
3c82369
to
82f7820
Compare
Squashed |
/test pull-kubernetes-bazel |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dchen1107, msau42, yujuhong Associated issue: 48045 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
cc/ @luxas Can you help to verify the fix with kubeadm? Thanks! |
/test all [submit-queue is verifying that this PR is safe to merge] |
@msau42: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Automatic merge from submit-queue (batch tested with PRs 48123, 48079) |
This one has cherrypick conflict. I'll try to do a manual cherrypick. |
…#48123-upstream-release-1.7 Automated cherry pick of #48123
Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
The pr caused the cross-build failures including 1.7 branch.
Tracked by #48234 |
What this PR does / why we need it:
This fixes a race condition where the container manager capacity map was being updated without synchronization. It moves the storage capacity detection to kubelet initialization, which happens serially in one thread.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #48045Release note: