Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node-Problem-Detector should Patch NodeStatus not Update #9

Closed
saad-ali opened this issue May 26, 2016 · 12 comments
Closed

Node-Problem-Detector should Patch NodeStatus not Update #9

saad-ali opened this issue May 26, 2016 · 12 comments
Assignees

Comments

@saad-ali
Copy link
Member

Problem:
kubelet, node-controller, and now the node-problem-detector all update the Node.Status field.

Normally this is not a problem because if changes happen rapidly, resource version mismatch fails and everything is ok.

When a new field is added to Node Status, however:

  • Since kubelet and node-controller are in the same repository they are recompiled with the new field and Status updates continue to operate normally.
  • However, since node-problem-detector is in a separate repository and has not been recompiled with the new version, its Update calls end up squashing the new (unknown) field, resetting it to nil.

Suggested Solution:
node-problem-detector should do a patch instead of an update to prevent wiping out fields it is not aware of. Incidentally, kubelet and node-controller should do this as well, but that is not as critical since they are in the same repository and new types are normally added their first.

CC @kubernetes/sig-node @kubernetes/sig-api-machinery @Random-Liu @bgrant0607

@Random-Liu Random-Liu added the bug label May 26, 2016
@Random-Liu Random-Liu self-assigned this May 26, 2016
@caesarxuchao
Copy link
Member

You can use the Patch() method in the dynamic client, but you'll need to modify that function to allow specifying a subresource. cc @krousey

@lavalamp
Copy link
Member

I didn't know you guys were building this in a different repository. If you aren't going to be version-locked like the other components are, then you need some features that aren't built into our client.

Patch and/or dynamic client seems to be your best way forward.

There are various ways we could have made this easier, but they won't happen in time for your first release...

@Random-Liu
Copy link
Member

@saad-ali Thanks a lot for reporting!

You can use the Patch() method in the dynamic client, but you'll need to modify that function to allow specifying a subresource. cc @krousey

I'll try this!

@lavalamp Thanks a lot! Patch is good enough for now~ Let me try :)

@derekwaynecarr
Copy link
Member

I think we should also have a node e2e that verifies the Kubelet itself
does not regress and drop node conditions it does not manage. I don't
think we have anything that verifies that today and it would help avoid
regressions.

On Thursday, May 26, 2016, Lantao Liu notifications@github.com wrote:

@saad-ali https://github.com/saad-ali Thanks a lot for reporting!

You can use the Patch() method in the dynamic client, but you'll need to
modify that function to allow specifying a subresource. cc @krousey
https://github.com/krousey

I'll try this!

@lavalamp https://github.com/lavalamp Thanks a lot! Patch is good
enough for now~ Let me try :)


You are receiving this because you are on a team that was mentioned.
Reply to this email directly or view it on GitHub
#9 (comment)

@dchen1107
Copy link
Member

Yes, I talked @Random-Liu on e2e test verifying this.

@bgrant0607
Copy link
Member

@saad-ali Ah, so it was getting clobbered. Great catch! Where multiple components update a resource, they'll need a way to patch, either via PATCH or via a dedicated subresource.

This is a problem already with node controller and Kubelet. I'm surprised we hadn't been bit by it yet.

We have a similar problem with kubectl round-tripping, which we're working to eliminate.

@lavalamp
Copy link
Member

@bgrant0607 Yeah. Very surprising this is the first time we've hit this. BTW making kubectl not round-trip wouldn't be sufficient, but @krousey was actually switching everything to use the dynamic client, which will be sufficient to prevent this.

Also, this is the canary in the coal mine: everyone who built automation using our current client will have this problem in 1.3.

@Random-Liu
Copy link
Member

I think we should also have a node e2e that verifies the Kubelet itself
does not regress and drop node conditions it does not manage. I don't
think we have anything that verifies that today and it would help avoid
regressions.

Yeah, we are going to add test for this. :)

@Random-Liu
Copy link
Member

Random-Liu commented May 26, 2016

FYI, I've tried many ways to patch NodeCondition, but the apiserver keeps throwing out 405 error. I'll work with @caesarxuchao to figure out whether we could and how to patch status correctly (Thanks! :)).

And I had an offline discussion with @saad-ali, to unblock him we'll:

Because node-problem-detector only touches NodeStatus and kubernetes/kubernetes#26351 seems to be the only PR adding new field in NodeStatus recently, so this could be a temporarily walk around.

In the meantime, I'll change the node-problem-detector to use Patch asap.

@dchen1107 @bgrant0607 @lavalamp Is this OK?

@Random-Liu
Copy link
Member

Random-Liu commented May 26, 2016

Offline discussed with @dchen1107, we will:

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue May 28, 2016
Automatic merge from submit-queue

Expose GET and PATCH for status subresource

We can do this for other status subresource. I only updated node/status in this PR to unblock kubernetes/node-problem-detector#9.

cc @Random-Liu @lavalamp
@wojtek-t
Copy link
Member

wojtek-t commented Jun 2, 2016

but @krousey was actually switching everything to use the dynamic client, which will be sufficient to prevent this.

@lavalamp - what do you mean by everything? If you really mean "everything", then we need to support protobufs in it, which we currently don't. @smarterclayton FYI

@lavalamp
Copy link
Member

lavalamp commented Jun 2, 2016

Yeah, dynamic client only talk json right now afaik.

On Wed, Jun 1, 2016 at 11:27 PM, Wojciech Tyczynski <
notifications@github.com> wrote:

but @krousey https://github.com/krousey was actually switching
everything to use the dynamic client, which will be sufficient to prevent
this.

@lavalamp https://github.com/lavalamp - what do you mean by everything?
If you really mean "everything", then we need to support protobufs in it,
which we currently don't. @smarterclayton
https://github.com/smarterclayton FYI


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#9 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAnglj--252rYNcJXBo49u0HU6oad_5tks5qHnfSgaJpZM4InIBd
.

saad-ali added a commit to saad-ali/kubernetes that referenced this issue Jun 2, 2016
This PR contains Kubelet changes to enable attach/detach controller control.
* It introduces a new "enable-controller-attach-detach" kubelet flag to
  enable control by controller. Default enabled.
* It removes all references "SafeToDetach" annoation from controller.
* It adds the new VolumesInUse field to the Node Status API object.
* It modifies the controller to use VolumesInUse instead of SafeToDetach
  annotation to gate detachment.
* There is a bug in node-problem-detector that causes VolumesInUse to
  get reset every 30 seconds. Issue kubernetes/node-problem-detector#9
  opened to fix that.
k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Jun 3, 2016
…Changes

Automatic merge from submit-queue

Attach/Detach Controller Kubelet Changes

This PR contains changes to enable attach/detach controller proposed in #20262.

Specifically it:
* Introduces a new `enable-controller-attach-detach` kubelet flag to enable control by attach/detach controller. Default enabled.
* Removes all references `SafeToDetach` annotation from controller.
* Adds the new `VolumesInUse` field to the Node Status API object.
* Modifies the controller to use `VolumesInUse` instead of `SafeToDetach` annotation to gate detachment.
* Modifies kubelet to set `VolumesInUse` before Mount and after Unmount.
  * There is a bug in the `node-problem-detector` binary that causes `VolumesInUse` to get reset to nil every 30 seconds. Issue kubernetes/node-problem-detector#9 (comment) opened to fix that.
  * There is a bug here in the mount/unmount code that prevents resetting `VolumeInUse in some cases, this will be fixed by mount/unmount refactor.
* Have controller process detaches before attaches so that volumes referenced by pods that are rescheduled to a different node are detached first.
* Fix misc bugs in controller.
* Modify GCE attacher to: remove retries, remove mutex, and not fail if volume is already attached or already detached.

Fixes #14642, #19953

```release-note
Kubernetes v1.3 introduces a new Attach/Detach Controller. This controller manages attaching and detaching volumes on-behalf of nodes that have the "volumes.kubernetes.io/controller-managed-attach-detach" annotation.

A kubelet flag, "enable-controller-attach-detach" (default true), controls whether a node sets the "controller-managed-attach-detach" or not.
```
mtaufen pushed a commit to mtaufen/kubernetes that referenced this issue Jun 6, 2016
This PR contains Kubelet changes to enable attach/detach controller control.
* It introduces a new "enable-controller-attach-detach" kubelet flag to
  enable control by controller. Default enabled.
* It removes all references "SafeToDetach" annoation from controller.
* It adds the new VolumesInUse field to the Node Status API object.
* It modifies the controller to use VolumesInUse instead of SafeToDetach
  annotation to gate detachment.
* There is a bug in node-problem-detector that causes VolumesInUse to
  get reset every 30 seconds. Issue kubernetes/node-problem-detector#9
  opened to fix that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants