-
Notifications
You must be signed in to change notification settings - Fork 637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node-Problem-Detector should Patch NodeStatus not Update #9
Comments
I didn't know you guys were building this in a different repository. If you aren't going to be version-locked like the other components are, then you need some features that aren't built into our client. Patch and/or dynamic client seems to be your best way forward. There are various ways we could have made this easier, but they won't happen in time for your first release... |
I think we should also have a node e2e that verifies the Kubelet itself On Thursday, May 26, 2016, Lantao Liu notifications@github.com wrote:
|
Yes, I talked @Random-Liu on e2e test verifying this. |
@saad-ali Ah, so it was getting clobbered. Great catch! Where multiple components update a resource, they'll need a way to patch, either via PATCH or via a dedicated subresource. This is a problem already with node controller and Kubelet. I'm surprised we hadn't been bit by it yet. We have a similar problem with kubectl round-tripping, which we're working to eliminate. |
@bgrant0607 Yeah. Very surprising this is the first time we've hit this. BTW making kubectl not round-trip wouldn't be sufficient, but @krousey was actually switching everything to use the dynamic client, which will be sufficient to prevent this. Also, this is the canary in the coal mine: everyone who built automation using our current client will have this problem in 1.3. |
Yeah, we are going to add test for this. :) |
FYI, I've tried many ways to patch And I had an offline discussion with @saad-ali, to unblock him we'll:
Because node-problem-detector only touches In the meantime, I'll change the node-problem-detector to use Patch asap. @dchen1107 @bgrant0607 @lavalamp Is this OK? |
Offline discussed with @dchen1107, we will:
|
Automatic merge from submit-queue Expose GET and PATCH for status subresource We can do this for other status subresource. I only updated node/status in this PR to unblock kubernetes/node-problem-detector#9. cc @Random-Liu @lavalamp
@lavalamp - what do you mean by everything? If you really mean "everything", then we need to support protobufs in it, which we currently don't. @smarterclayton FYI |
Yeah, dynamic client only talk json right now afaik. On Wed, Jun 1, 2016 at 11:27 PM, Wojciech Tyczynski <
|
This PR contains Kubelet changes to enable attach/detach controller control. * It introduces a new "enable-controller-attach-detach" kubelet flag to enable control by controller. Default enabled. * It removes all references "SafeToDetach" annoation from controller. * It adds the new VolumesInUse field to the Node Status API object. * It modifies the controller to use VolumesInUse instead of SafeToDetach annotation to gate detachment. * There is a bug in node-problem-detector that causes VolumesInUse to get reset every 30 seconds. Issue kubernetes/node-problem-detector#9 opened to fix that.
…Changes Automatic merge from submit-queue Attach/Detach Controller Kubelet Changes This PR contains changes to enable attach/detach controller proposed in #20262. Specifically it: * Introduces a new `enable-controller-attach-detach` kubelet flag to enable control by attach/detach controller. Default enabled. * Removes all references `SafeToDetach` annotation from controller. * Adds the new `VolumesInUse` field to the Node Status API object. * Modifies the controller to use `VolumesInUse` instead of `SafeToDetach` annotation to gate detachment. * Modifies kubelet to set `VolumesInUse` before Mount and after Unmount. * There is a bug in the `node-problem-detector` binary that causes `VolumesInUse` to get reset to nil every 30 seconds. Issue kubernetes/node-problem-detector#9 (comment) opened to fix that. * There is a bug here in the mount/unmount code that prevents resetting `VolumeInUse in some cases, this will be fixed by mount/unmount refactor. * Have controller process detaches before attaches so that volumes referenced by pods that are rescheduled to a different node are detached first. * Fix misc bugs in controller. * Modify GCE attacher to: remove retries, remove mutex, and not fail if volume is already attached or already detached. Fixes #14642, #19953 ```release-note Kubernetes v1.3 introduces a new Attach/Detach Controller. This controller manages attaching and detaching volumes on-behalf of nodes that have the "volumes.kubernetes.io/controller-managed-attach-detach" annotation. A kubelet flag, "enable-controller-attach-detach" (default true), controls whether a node sets the "controller-managed-attach-detach" or not. ```
This PR contains Kubelet changes to enable attach/detach controller control. * It introduces a new "enable-controller-attach-detach" kubelet flag to enable control by controller. Default enabled. * It removes all references "SafeToDetach" annoation from controller. * It adds the new VolumesInUse field to the Node Status API object. * It modifies the controller to use VolumesInUse instead of SafeToDetach annotation to gate detachment. * There is a bug in node-problem-detector that causes VolumesInUse to get reset every 30 seconds. Issue kubernetes/node-problem-detector#9 opened to fix that.
Problem:
kubelet
,node-controller
, and now thenode-problem-detector
all update theNode.Status
field.Normally this is not a problem because if changes happen rapidly, resource version mismatch fails and everything is ok.
When a new field is added to Node Status, however:
kubelet
andnode-controller
are in the same repository they are recompiled with the new field and Status updates continue to operate normally.node-problem-detector
is in a separate repository and has not been recompiled with the new version, itsUpdate
calls end up squashing the new (unknown) field, resetting it tonil
.Suggested Solution:
node-problem-detector
should do a patch instead of an update to prevent wiping out fields it is not aware of. Incidentally,kubelet
andnode-controller
should do this as well, but that is not as critical since they are in the same repository and new types are normally added their first.CC @kubernetes/sig-node @kubernetes/sig-api-machinery @Random-Liu @bgrant0607
The text was updated successfully, but these errors were encountered: