-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alpha Dynamic Kubelet Configuration #46254
Conversation
cmd/kubelet/app/server.go
Outdated
if err != nil { | ||
return err | ||
} | ||
s.KubeletConfiguration = kcToUseInternal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts on the structure of this:
- can we separate the dynamic config setup into a separate function/file to make the inputs (config and client) and outputs (controller and resulting config) clearer?
- mutating the config in the KubeletServer object makes this hard to reason about. can you hold the dynamically fetched config locally and propagate it to where it needs to go instead?
- I would expect all the dynamic config lookup to happen after the BootstrapKubeconfig handling... otherwise, bootstrapped kubelets won't have a client yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can split that into another function in server.go.
KubeletServer already gets propagated to where the config is needed, and this concept is deeply threaded through the server.go code. I'm open to discussing whether this is the right flow to have, but I think it's out of scope for this PR - which is really more focused on the "obtain a valid config" part vs. the "pass that config around" part.
I'll look into moving the setup to after the BootstrapKubeconfig handling.
cmd/kubelet/app/server.go
Outdated
|
||
var ncc *nodeconfig.NodeConfigController | ||
if useDynamicConfig { | ||
client, err := getKubeClient(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will fail in TLS bootstrap cases where the kubelet doesn't have a client yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into moving this after the TLS bootstrap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that TLS bootstrap shouldn't depend on the KubeletConfiguration, but who was I kidding, everything depends on the KubeletConfiguration ;). It needs ContentType, KubeAPIQPS, and KubeAPIBurst. This creates a mild chicken-egg problem.
Present "solution" (imperfect) is to do the bare minimum to get a client based on the config supplied at startup and initialize the config controller with this client, then allow the client to be regenerated once we have a validated config.
This still leaves the config controller using the old client for its sync-loop, which means the QPS, etc. parameters might not be uniform between the clients. I'll look into splitting starting the conifg sync-loop out from running the Run()
function on the controller, so that any new client can be injected there when it is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that TLS bootstrap shouldn't depend on the KubeletConfiguration, but who was I kidding, everything depends on the KubeletConfiguration ;). It needs ContentType, KubeAPIQPS, and KubeAPIBurst. This creates a mild chicken-egg problem.
I don't see those settings being required. From what I can see, the kubelet TLS bootstrap requires this information:
- bootstrap kubeconfig file
- path to write resulting kubeconfig to
- dir to write resulting client certificate to
- name of the node (which requires the following info to determine)
- cloud provider
- cloud provider config file
- hostname override
Those files and paths don't seems like something you'd need to be dynamic, and the name of the node is needed to look up the dynamic config in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right. I was erroneously thinking of "get a client" as somehow involved because I saw that path reading the kubeconfig file. It's the client getting that needs the API call config parameters I mentioned above.
The cert dir, cloud provider, and cloud provider config file path, however, are still on the KubeletConfiguration. I wasn't planning on touching the KubeletConfiguration type itself in this PR, since I had another PR for refactoring that... but I suppose it doesn't hurt to remove a few parameters from it that clearly shouldn't be dynamic - especially given it means we won't have to worry about regenerating certs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok those fields are yanked, no cert regen.
Clients are regenerated from latest config so that the QPS, etc. parameters are up-to-date, and then the latest client is injected when starting the sync loop.
cmd/kubelet/app/server.go
Outdated
return err | ||
// when you run the controller, either you get back a valid config to use, or the Kubelet crashes because | ||
// something was fatally wrong with the configuration. Non-fatal errors will be logged, but not returned from Run(). | ||
kcToUse := ncc.Run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still expect an error returned, not an internal panic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the Kubelet cannot reliably determine a configuration to use, it should refuse to start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree, but that should happen via a bubbling error, not a panic deep in a config library
pkg/api/v1/types.go
Outdated
// If specified, the source to get node configuration from | ||
// The DynamicKubeletConfig feature gate must be enabled for the Kubelet to use this field | ||
// +optional | ||
ConfigSource *NodeConfigSource `json:"configSource,omitempty" protobuf:"bytes,6,opt,name=configSource"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we determine whether a node is permitted to access a particular configmap? nodes create their API objects on startup, if needed. Should they be permitted to create a Node API object that references a configmap config source? That's letting a node escalate their view permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Node can use the configuration from a particular ConfigMap if its Kubelet can read that ConfigMap. IIUC, the Kubelet registers the Node object. Either the Kubelet has permission to read the config it sets on the Node at that time, or it doesn't, and I don't see how this permission changes when the Node object is created.
I think we should probably allow Nodes to reference config at creation time, so an autoscaler can spin up Nodes that immediately refer to the desired config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-read your scoped Kubelet access proposal (kubernetes/community#585) just now. If the node authorizer takes "related to the requesting node" to include "node -> configmap", then the permissions I just noted are too broad to keep things scoped to just that Kubelet's node. Since that uses specs as sources of scope information, you're correct that there is an escalation path. The node authorizer would have to either disallow Kubelets from creating nodes that initially reference configuration, or have some other source of permissions information for this configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fixed with the additions to plugin/pkg/admission/noderestriction/admission.go
in this PR.
pkg/apis/componentconfig/types.go
Outdated
// Only used for dynamic configuration. | ||
// The maximum number of Kubelet restarts that can occur in ConfigTrialDuration before the | ||
// configuration fails the trial and the Kubelet rolls back to its last-known-good config. | ||
// Default 3, mimimum allowed is 2, maximum allowed is 10. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minimum of 2 is odd... why are we required to let the kubelet crash twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, it is currently defaulting to 10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for that catch on the default vs. the comment.
The Kubelet has to restart at least once to take up the new configuration, and since the curSymlink would have been updated when the new configuration was downloaded, the startups file will contain one restart since that update. So we can't count the restart as a "crash" until we see 2 restarts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the kubelet can't tell when it crashed vs when it shut down in an orderly way to pick up a new config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's an existing way to do that.
I sketched one idea out just now:
Have a file that contains a bool - True by default (so the initial Kubelet startup isn't treated as a crash). If the Kubelet is about to exit to refresh its config, it writes True to the file. On restart, only startups where the file is False are recorded, and the Kubelet always resets the file to False after checking whether it should record a startup.
But there are edge cases - if you reboot a node for a legitimate reason (like an upgrade), it will be counted as a Kubelet crash unless you write True to the file as part of your reboot sequence.
I could also just make the threshold non-inclusive (> rather than >=), so the minimum would be 1, which might be less confusing. Just relying on the threshold is an imperfect solution, I agree, but I think it gives users decent wiggle-room without the implementation having to track additional state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it so the minimum can be 1 0, meaning no crashes allowed. I'd like to postpone trying to differentiate crashes from config-exits, as I have a feeling any hasty solution I dream up now will be riddled with edge-cases.
pkg/kubelet/nodeconfig/log.go
Outdated
} else { | ||
s = format | ||
} | ||
glog.FatalDepth(1, fmt.Sprintf(nodeconfigLogFmt, s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
process exiting inside a config library is unexpected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this case. We want the Kubelet to complain mightily if something is messing with its configuration filesystem by refusing to start if it can't reliably determine a configuration.
I wouldn't think of this as a library. It's a core part of the Kubelet startup workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returning an error is more expected and makes testing easier. os.Exit inside a config package is unexpected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, please don't exit here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll see if I can find a better way to plumb errors around. It was much nicer to implement this under the assumption that the Kubelet should just stop when it hits these classes of error, but the point about testing is enough reason for me to change this.
There are a lot of very low-level, very-bad errors that should prevent the controller from continuing, so I'm going to see if I can find a way return these fatal-class errors from the high-level Run()
without plumbing everything through return values under the hood - which makes error handling much uglier, and makes it harder to differentiate between "fatal" and "non-fatal" error classes. Maybe something along these lines: https://blog.golang.org/errors-are-values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the interest of immediately stopping the controller when things are really really wrong without inserting a bunch of error-handling boilerplate, how do you feel about panics in the very low-level functions that are recovered in Run()
and in the sync-loop? These are testable, unlike the fatal errors, allow us to return an error from Run()
, and only require a very small change (fatalf
calls glog.ErrorDepth
and panics instead of calling glog.FatalDepth
).
There is precedent in the Go standard library for using panic
like this, see https://golang.org/src/encoding/json/decode.go (func (d *decodeState) error(err error)
, which calls panic
, and the recover
call in unmarshal
, for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to the "panic" model are in the second commit, for reference.
pkg/kubelet/nodeconfig/parse.go
Outdated
} | ||
|
||
// TODO(mtaufen): Once the KubeletConfiguration type is decomposed (#44252), allow multiple keys to contain configuration. | ||
// Since we presently only expect one key, the for loop is a convenient way to express "take any value in the map". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the initial version of this should define the key we load config from. If we want to decompose into multiple keys later, we'll be able to, but "take the first arbitrary key" limits how we can evolve this in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this limit us? I don't see a problem evolving from "take the first arbitrary key" to "take all arbitrary keys". Besides, as long as it's in alpha we can make any breaking change we want.
This evolution will happen alongside #44252, which should be done for 1.8, prior to dynamic configuration exiting alpha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means the only direction we can evolve is "take all arbitrary keys", which I'm not a fan of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? The information we care about is the set of config objects (presently just one) in the data associated with those keys. As long as we get the right set of objects, why should we care about how they're distributed across the keys? We can evolve by adding new object types.
We will always have the schematic requirement that all the correct objects exist in the configuration payload. I'm wary of layering arbitrary schematic requirements on top of this, like requiring specific key names, because it just adds unnecessary complexity and more opportunities for user error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as we get the right set of objects, why should we care about how they're distributed across the keys?
We should start with a well-specified key. It is far easier to relax later than it is to tighten.
We can evolve by adding new object types.
In the simplest example, consider a kubelet built with this PR. Pointed at a configmap containing two keys, one of which is the current config object type and the other is a new config object type, it will fail if it encounters the other unknown object first.
Instead, if it looked for a specific key, then a single configmap could provide a compatible object to kubelets with this PR, and a new config object type to kubelets that knew to look for the appropriate key.
I'm wary of layering arbitrary schematic requirements on top of this, like requiring specific key names, because it just adds unnecessary complexity and more opportunities for user error.
Given the naming requirements around the config map hash/algorithm, I assume a tool will be needed to generate these. It should set the key names correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, it wouldn't be safe to roll out newly-typed configuration until all clients are updated to understand it. If we were planning on adding a new config object, we'd have to make it optional, upgrade the Kubelets to understand it, and then roll out the new configurations that use that object.
That said, I see some usefulness to being able to express a transition of configuration across Kubelet upgrades in the ConfigMap. What would you think of requiring the key name match the minimum version of the Kubelet that should be allowed to use a given configuration, e.g. 1.7.0
, or a key expressing the version of the API group that the config came from, e.g. kubelet.v1alpha1
? A tool could definitely be created to understand these relationships.
Alternatively, we could just glue each config object to a key name, and Kubelets could just ignore keys they don't understand.
For this PR, I'll just require the config live in a key called kubelet.v1alpha1
, and we can keep discussing this as the feature develops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could just glue each config object to a key name, and Kubelets could just ignore keys they don't understand.
That is what I expected, and seems more in line with how most APIs behave
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'll actually reduce it all the way back to the kubelet
key that was originally in the proposal, so we don't commit to any scheme beyond that.
93cac39
to
5aa8264
Compare
/retest |
obj.ConfigTrialDuration = &metav1.Duration{Duration: 10 * time.Minute} | ||
} | ||
if obj.CrashLoopThreshold == nil { | ||
obj.CrashLoopThreshold = utilpointer.Int32Ptr(3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we bump this up to 10 for now until we resolve #50216? Otherwise, this could be very dangerous for anyone attempt to use this feature end up to no good kubelet config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Alpha implementation of the Dynamic Kubelet Configuration feature. See the proposal doc in kubernetes#29459.
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dchen1107, mtaufen, thockin Associated issue: 281 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest |
/test pull-kubernetes-verify |
/retest Review the full test history for this PR. |
Automatic merge from submit-queue (batch tested with PRs 50016, 49583, 49930, 46254, 50337) |
Feature: kubernetes/enhancements#281
This proposal contains the alpha implementation of the Dynamic Kubelet Configuration feature proposed in
#29459community/contributors/design-proposals/dynamic-kubelet-configuration.md.Please note:
The proposal doc is not yet up to date with this implementation, there are some subtle differences and some more significant ones. I will update the proposal doc to match by tomorrow afternoon.This obviously needs more tests. I plan to write several O(soon). Since it's alpha and feature-gated, I'm decoupling this review from the review of the tests.I've beefed up the unit tests, though there is still plenty of testing to be done.I'm temporarily holding off on updating the generated docs, api specs, etc, for the sake of my reviewers 😄these files now live in a separate commit; the first commit is the one to review./cc @dchen1107 @vishh @bgrant0607 @thockin @derekwaynecarr