Alpha Dynamic Kubelet Configuration #46254

mtaufen · 2017-05-22T22:43:52Z

This proposal contains the alpha implementation of the Dynamic Kubelet Configuration feature proposed in ~~#29459~~ community/contributors/design-proposals/dynamic-kubelet-configuration.md.

Please note:

~~The proposal doc is not yet up to date with this implementation, there are some subtle differences and some more significant ones. I will update the proposal doc to match by tomorrow afternoon.~~
~~This obviously needs more tests. I plan to write several O(soon). Since it's alpha and feature-gated, I'm decoupling this review from the review of the tests.~~ I've beefed up the unit tests, though there is still plenty of testing to be done.
~~I'm temporarily holding off on updating the generated docs, api specs, etc, for the sake of my reviewers 😄~~ these files now live in a separate commit; the first commit is the one to review.

/cc @dchen1107 @vishh @bgrant0607 @thockin @derekwaynecarr

Adds (alpha feature) the ability to dynamically configure Kubelets by enabling the DynamicKubeletConfig feature gate, posting a ConfigMap to the API server, and setting the spec.configSource field on Node objects. See the proposal at https://github.com/kubernetes/community/blob/master/contributors/design-proposals/dynamic-kubelet-configuration.md for details.

liggitt · 2017-05-23T17:32:00Z

cmd/kubelet/app/server.go

+		if err != nil {
+			return err
+		}
+		s.KubeletConfiguration = kcToUseInternal


A few thoughts on the structure of this:

can we separate the dynamic config setup into a separate function/file to make the inputs (config and client) and outputs (controller and resulting config) clearer?

mutating the config in the KubeletServer object makes this hard to reason about. can you hold the dynamically fetched config locally and propagate it to where it needs to go instead?

I would expect all the dynamic config lookup to happen after the BootstrapKubeconfig handling... otherwise, bootstrapped kubelets won't have a client yet.

I can split that into another function in server.go.

KubeletServer already gets propagated to where the config is needed, and this concept is deeply threaded through the server.go code. I'm open to discussing whether this is the right flow to have, but I think it's out of scope for this PR - which is really more focused on the "obtain a valid config" part vs. the "pass that config around" part.

I'll look into moving the setup to after the BootstrapKubeconfig handling.

liggitt · 2017-05-23T17:33:06Z

cmd/kubelet/app/server.go

+
+		var ncc *nodeconfig.NodeConfigController
+		if useDynamicConfig {
+			client, err := getKubeClient(s)


this will fail in TLS bootstrap cases where the kubelet doesn't have a client yet

I'll look into moving this after the TLS bootstrap.

I was under the impression that TLS bootstrap shouldn't depend on the KubeletConfiguration, but who was I kidding, everything depends on the KubeletConfiguration ;). It needs ContentType, KubeAPIQPS, and KubeAPIBurst. This creates a mild chicken-egg problem.

Present "solution" (imperfect) is to do the bare minimum to get a client based on the config supplied at startup and initialize the config controller with this client, then allow the client to be regenerated once we have a validated config.

This still leaves the config controller using the old client for its sync-loop, which means the QPS, etc. parameters might not be uniform between the clients. I'll look into splitting starting the conifg sync-loop out from running the Run() function on the controller, so that any new client can be injected there when it is available.

I was under the impression that TLS bootstrap shouldn't depend on the KubeletConfiguration, but who was I kidding, everything depends on the KubeletConfiguration ;). It needs ContentType, KubeAPIQPS, and KubeAPIBurst. This creates a mild chicken-egg problem.

I don't see those settings being required. From what I can see, the kubelet TLS bootstrap requires this information:

bootstrap kubeconfig file

path to write resulting kubeconfig to

dir to write resulting client certificate to

name of the node (which requires the following info to determine)

cloud provider

cloud provider config file

hostname override

Those files and paths don't seems like something you'd need to be dynamic, and the name of the node is needed to look up the dynamic config in the first place.

Yeah, you're right. I was erroneously thinking of "get a client" as somehow involved because I saw that path reading the kubeconfig file. It's the client getting that needs the API call config parameters I mentioned above.

The cert dir, cloud provider, and cloud provider config file path, however, are still on the KubeletConfiguration. I wasn't planning on touching the KubeletConfiguration type itself in this PR, since I had another PR for refactoring that... but I suppose it doesn't hurt to remove a few parameters from it that clearly shouldn't be dynamic - especially given it means we won't have to worry about regenerating certs.

Ok those fields are yanked, no cert regen.
Clients are regenerated from latest config so that the QPS, etc. parameters are up-to-date, and then the latest client is injected when starting the sync loop.

liggitt · 2017-05-23T17:33:23Z

cmd/kubelet/app/server.go

-		return err
+		// when you run the controller, either you get back a valid config to use, or the Kubelet crashes because
+		// something was fatally wrong with the configuration. Non-fatal errors will be logged, but not returned from Run().
+		kcToUse := ncc.Run()


I'd still expect an error returned, not an internal panic

If the Kubelet cannot reliably determine a configuration to use, it should refuse to start.

I don't disagree, but that should happen via a bubbling error, not a panic deep in a config library

liggitt · 2017-05-23T17:36:26Z

pkg/api/v1/types.go

+	// If specified, the source to get node configuration from
+	// The DynamicKubeletConfig feature gate must be enabled for the Kubelet to use this field
+	// +optional
+	ConfigSource *NodeConfigSource `json:"configSource,omitempty" protobuf:"bytes,6,opt,name=configSource"`


how do we determine whether a node is permitted to access a particular configmap? nodes create their API objects on startup, if needed. Should they be permitted to create a Node API object that references a configmap config source? That's letting a node escalate their view permissions.

A Node can use the configuration from a particular ConfigMap if its Kubelet can read that ConfigMap. IIUC, the Kubelet registers the Node object. Either the Kubelet has permission to read the config it sets on the Node at that time, or it doesn't, and I don't see how this permission changes when the Node object is created.

I think we should probably allow Nodes to reference config at creation time, so an autoscaler can spin up Nodes that immediately refer to the desired config.

I re-read your scoped Kubelet access proposal (kubernetes/community#585) just now. If the node authorizer takes "related to the requesting node" to include "node -> configmap", then the permissions I just noted are too broad to keep things scoped to just that Kubelet's node. Since that uses specs as sources of scope information, you're correct that there is an escalation path. The node authorizer would have to either disallow Kubelets from creating nodes that initially reference configuration, or have some other source of permissions information for this configuration.

I think this is fixed with the additions to plugin/pkg/admission/noderestriction/admission.go in this PR.

liggitt · 2017-05-23T17:38:22Z

pkg/apis/componentconfig/types.go

+	// Only used for dynamic configuration.
+	// The maximum number of Kubelet restarts that can occur in ConfigTrialDuration before the
+	// configuration fails the trial and the Kubelet rolls back to its last-known-good config.
+	// Default 3, mimimum allowed is 2, maximum allowed is 10.


minimum of 2 is odd... why are we required to let the kubelet crash twice?

also, it is currently defaulting to 10

Ah, thanks for that catch on the default vs. the comment.

The Kubelet has to restart at least once to take up the new configuration, and since the curSymlink would have been updated when the new configuration was downloaded, the startups file will contain one restart since that update. So we can't count the restart as a "crash" until we see 2 restarts.

so the kubelet can't tell when it crashed vs when it shut down in an orderly way to pick up a new config?

I don't think there's an existing way to do that.

I sketched one idea out just now:
Have a file that contains a bool - True by default (so the initial Kubelet startup isn't treated as a crash). If the Kubelet is about to exit to refresh its config, it writes True to the file. On restart, only startups where the file is False are recorded, and the Kubelet always resets the file to False after checking whether it should record a startup.

But there are edge cases - if you reboot a node for a legitimate reason (like an upgrade), it will be counted as a Kubelet crash unless you write True to the file as part of your reboot sequence.

I could also just make the threshold non-inclusive (> rather than >=), so the minimum would be 1, which might be less confusing. Just relying on the threshold is an imperfect solution, I agree, but I think it gives users decent wiggle-room without the implementation having to track additional state.

I made it so the minimum can be 1 0, meaning no crashes allowed. I'd like to postpone trying to differentiate crashes from config-exits, as I have a feeling any hasty solution I dream up now will be riddled with edge-cases.

liggitt · 2017-05-23T17:41:32Z

pkg/kubelet/nodeconfig/log.go

+	} else {
+		s = format
+	}
+	glog.FatalDepth(1, fmt.Sprintf(nodeconfigLogFmt, s))


process exiting inside a config library is unexpected

Not in this case. We want the Kubelet to complain mightily if something is messing with its configuration filesystem by refusing to start if it can't reliably determine a configuration.

I wouldn't think of this as a library. It's a core part of the Kubelet startup workflow.

returning an error is more expected and makes testing easier. os.Exit inside a config package is unexpected.

+1, please don't exit here

Ok, I'll see if I can find a better way to plumb errors around. It was much nicer to implement this under the assumption that the Kubelet should just stop when it hits these classes of error, but the point about testing is enough reason for me to change this.

There are a lot of very low-level, very-bad errors that should prevent the controller from continuing, so I'm going to see if I can find a way return these fatal-class errors from the high-level Run() without plumbing everything through return values under the hood - which makes error handling much uglier, and makes it harder to differentiate between "fatal" and "non-fatal" error classes. Maybe something along these lines: https://blog.golang.org/errors-are-values.

In the interest of immediately stopping the controller when things are really really wrong without inserting a bunch of error-handling boilerplate, how do you feel about panics in the very low-level functions that are recovered in Run() and in the sync-loop? These are testable, unlike the fatal errors, allow us to return an error from Run(), and only require a very small change (fatalf calls glog.ErrorDepth and panics instead of calling glog.FatalDepth).

There is precedent in the Go standard library for using panic like this, see https://golang.org/src/encoding/json/decode.go (func (d *decodeState) error(err error), which calls panic, and the recover call in unmarshal, for example).

Changes to the "panic" model are in the second commit, for reference.

liggitt · 2017-05-23T17:43:01Z

pkg/kubelet/nodeconfig/parse.go

+	}
+
+	// TODO(mtaufen): Once the KubeletConfiguration type is decomposed (#44252), allow multiple keys to contain configuration.
+	// Since we presently only expect one key, the for loop is a convenient way to express "take any value in the map".


I think the initial version of this should define the key we load config from. If we want to decompose into multiple keys later, we'll be able to, but "take the first arbitrary key" limits how we can evolve this in the future.

How does this limit us? I don't see a problem evolving from "take the first arbitrary key" to "take all arbitrary keys". Besides, as long as it's in alpha we can make any breaking change we want.

This evolution will happen alongside #44252, which should be done for 1.8, prior to dynamic configuration exiting alpha.

It means the only direction we can evolve is "take all arbitrary keys", which I'm not a fan of.

Why not? The information we care about is the set of config objects (presently just one) in the data associated with those keys. As long as we get the right set of objects, why should we care about how they're distributed across the keys? We can evolve by adding new object types.

We will always have the schematic requirement that all the correct objects exist in the configuration payload. I'm wary of layering arbitrary schematic requirements on top of this, like requiring specific key names, because it just adds unnecessary complexity and more opportunities for user error.

As long as we get the right set of objects, why should we care about how they're distributed across the keys?

We should start with a well-specified key. It is far easier to relax later than it is to tighten.

We can evolve by adding new object types.

In the simplest example, consider a kubelet built with this PR. Pointed at a configmap containing two keys, one of which is the current config object type and the other is a new config object type, it will fail if it encounters the other unknown object first.

Instead, if it looked for a specific key, then a single configmap could provide a compatible object to kubelets with this PR, and a new config object type to kubelets that knew to look for the appropriate key.

I'm wary of layering arbitrary schematic requirements on top of this, like requiring specific key names, because it just adds unnecessary complexity and more opportunities for user error.

Given the naming requirements around the config map hash/algorithm, I assume a tool will be needed to generate these. It should set the key names correctly.

In general, it wouldn't be safe to roll out newly-typed configuration until all clients are updated to understand it. If we were planning on adding a new config object, we'd have to make it optional, upgrade the Kubelets to understand it, and then roll out the new configurations that use that object.

That said, I see some usefulness to being able to express a transition of configuration across Kubelet upgrades in the ConfigMap. What would you think of requiring the key name match the minimum version of the Kubelet that should be allowed to use a given configuration, e.g. 1.7.0, or a key expressing the version of the API group that the config came from, e.g. kubelet.v1alpha1? A tool could definitely be created to understand these relationships.

Alternatively, we could just glue each config object to a key name, and Kubelets could just ignore keys they don't understand.

For this PR, I'll just require the config live in a key called kubelet.v1alpha1, and we can keep discussing this as the feature develops.

Alternatively, we could just glue each config object to a key name, and Kubelets could just ignore keys they don't understand.

That is what I expected, and seems more in line with how most APIs behave

Ok. I'll actually reduce it all the way back to the kubelet key that was originally in the proposal, so we don't commit to any scheme beyond that.

dchen1107 · 2017-08-07T20:43:23Z

Had some offline discussion with @mtaufen and we agreed to handle above comments through separate issues: #50215, #50216 and #50217 since this pr is already too big.

Please rebase the pr, we should continue.

luxas · 2017-08-08T10:09:37Z

/retest

dchen1107 · 2017-08-08T18:53:41Z

pkg/apis/componentconfig/v1alpha1/defaults.go

+		obj.ConfigTrialDuration = &metav1.Duration{Duration: 10 * time.Minute}
+	}
+	if obj.CrashLoopThreshold == nil {
+		obj.CrashLoopThreshold = utilpointer.Int32Ptr(3)


Can we bump this up to 10 for now until we resolve #50216? Otherwise, this could be very dangerous for anyone attempt to use this feature end up to no good kubelet config.

Alpha implementation of the Dynamic Kubelet Configuration feature. See the proposal doc in kubernetes#29459.

dchen1107 · 2017-08-08T20:15:33Z

/retest

dchen1107 · 2017-08-08T20:17:16Z

/lgtm

k8s-github-robot · 2017-08-08T20:17:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, mtaufen, thockin

Associated issue: 281

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~api/OWNERS~~ [dchen1107,thockin]
~~cmd/OWNERS~~ [dchen1107,thockin]
~~docs/OWNERS~~ [dchen1107,thockin]
~~hack/OWNERS~~ [dchen1107,thockin]
~~pkg/api/OWNERS~~ [dchen1107,thockin]
~~pkg/kubelet/OWNERS~~ [dchen1107,thockin]
~~pkg/kubemark/OWNERS~~ [dchen1107,thockin]
~~plugin/pkg/admission/noderestriction/OWNERS~~ [dchen1107,thockin]
~~staging/src/k8s.io/api/core/OWNERS~~ [dchen1107,thockin]
~~staging/src/k8s.io/apimachinery/pkg/OWNERS~~ [dchen1107,thockin]
~~staging/src/k8s.io/sample-apiserver/OWNERS~~ [dchen1107,thockin]
~~test/OWNERS~~ [dchen1107,thockin]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

mtaufen · 2017-08-08T20:28:27Z

/retest

mtaufen · 2017-08-08T23:07:31Z

/test pull-kubernetes-verify

fejta-bot · 2017-08-09T02:16:30Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

k8s-github-robot · 2017-08-09T21:14:31Z

Automatic merge from submit-queue (batch tested with PRs 50016, 49583, 49930, 46254, 50337)

mtaufen added area/kubelet area/kubelet-api sig/node Categorizes an issue or PR as relevant to SIG Node. labels May 22, 2017

mtaufen added this to the v1.7 milestone May 22, 2017

k8s-ci-robot requested review from thockin, derekwaynecarr, vishh, bgrant0607 and dchen1107 May 22, 2017 22:43

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 22, 2017

mtaufen force-pushed the dkcfg branch from c2f4863 to c7990fc Compare May 22, 2017 22:48

mtaufen assigned thockin, derekwaynecarr, vishh and dchen1107 May 22, 2017

k8s-github-robot assigned wojtek-t May 22, 2017

mtaufen force-pushed the dkcfg branch from c7990fc to 7cc3b2a Compare May 23, 2017 16:57

This was referenced May 23, 2017

Dynamic Kubelet Configuration kubernetes/enhancements#281

Closed

Dynamic Kubelet Settings #27980

Closed

liggitt reviewed May 23, 2017

View reviewed changes

mtaufen force-pushed the dkcfg branch from 81b7462 to 5e68e25 Compare August 3, 2017 22:50

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 4, 2017

This was referenced Aug 7, 2017

Improve and test validation of KubeletConfiguration #50215

Closed

Improve bad-config crash-loop detection logic #50216

Closed

Improve dynamic kubelet config testing #50217

Closed

mtaufen force-pushed the dkcfg branch from 5e68e25 to 2710e0b Compare August 8, 2017 00:48

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 8, 2017

mtaufen force-pushed the dkcfg branch 2 times, most recently from 93cac39 to 5aa8264 Compare August 8, 2017 03:51

dchen1107 reviewed Aug 8, 2017

View reviewed changes

mtaufen added 3 commits August 8, 2017 12:21

Dynamic Kubelet Configuration

443d58e

Alpha implementation of the Dynamic Kubelet Configuration feature. See the proposal doc in kubernetes#29459.

additional generated files

6f0bd47

core generated files

3785443

mtaufen force-pushed the dkcfg branch from 5aa8264 to 3785443 Compare August 8, 2017 19:21

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 8, 2017

krzyzacy mentioned this pull request Aug 8, 2017

Bump timeout for verify job since it's timing out kubernetes/test-infra#3999

Merged

k8s-github-robot merged commit 458cc04 into kubernetes:master Aug 9, 2017

This was referenced Sep 3, 2017

Restarting kubelet with new node labels doesn't update labels for node #28051

Closed

Node status not updating latest metadata for re-registration. #43678

Closed

Custom label on node was lost after vm shut off but not after vm reboot #38988

Closed

tallclair mentioned this pull request Jan 22, 2024

[Proposal] Add most non-deprecated flags to the Kubelet configuration API #122916

Closed

14 tasks

Alpha Dynamic Kubelet Configuration #46254

Alpha Dynamic Kubelet Configuration #46254

Conversation

mtaufen commented May 22, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt May 25, 2017 • edited Loading

Choose a reason for hiding this comment

mtaufen May 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtaufen May 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtaufen May 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtaufen May 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt May 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dchen1107 commented Aug 7, 2017 • edited Loading

luxas commented Aug 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dchen1107 commented Aug 8, 2017

dchen1107 commented Aug 8, 2017

k8s-github-robot commented Aug 8, 2017

mtaufen commented Aug 8, 2017

mtaufen commented Aug 8, 2017

fejta-bot commented Aug 9, 2017

k8s-github-robot commented Aug 9, 2017

mtaufen commented May 22, 2017 •

edited

Loading

liggitt May 25, 2017 •

edited

Loading

mtaufen May 25, 2017 •

edited

Loading

mtaufen May 24, 2017 •

edited

Loading

mtaufen May 25, 2017 •

edited

Loading

mtaufen May 23, 2017 •

edited

Loading

liggitt May 24, 2017 •

edited

Loading

dchen1107 commented Aug 7, 2017 •

edited

Loading