-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm: Make self-hosting work and split it out to a phase #47435
kubeadm: Make self-hosting work and split it out to a phase #47435
Conversation
So this stuff needs to be merged in order, so marking this one as do-not-merge till the others are ready. |
kubeconfigutil "k8s.io/kubernetes/cmd/kubeadm/app/util/kubeconfig" | ||
) | ||
|
||
func NewCmdSelfhosting() *cobra.Command { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should follow the conventions of // godoc on any public facing f(n)s.
kubeadmconstants "k8s.io/kubernetes/cmd/kubeadm/app/constants" | ||
) | ||
|
||
func MutatePodSpec(name string, podSpec *v1.PodSpec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same.
} | ||
} | ||
|
||
func addFileLockToPodSpec(podSpec *v1.PodSpec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely clear what you are doing here and why?
// Wait for the self-hosted component to come up | ||
kubeadmutil.WaitForPodsWithLabel(client, component, fmt.Sprintf("k8s-app=self-hosted-%s", component)) | ||
|
||
manifestPath := buildStaticManifestFilepath(component) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you comment this, it's totally unclear to anyone not working in this area what you are doing, and reference your phases.
|
||
manifestPath := buildStaticManifestFilepath(component) | ||
if err := os.RemoveAll(manifestPath); err != nil { | ||
return fmt.Errorf("unable to delete static pod manifest for %s [%v]", component, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we now potentially leave it in a 1/2 deployed state? and future creates could fail?
} | ||
|
||
func buildDaemonSet(name string) (*extensions.DaemonSet, error) { | ||
podSpec, err := loadPodSpecFromDisk(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again comment on why you are loading from the original static manifest.
|
||
return &extensions.DaemonSet{ | ||
ObjectMeta: metav1.ObjectMeta{ | ||
Name: "self-hosted-" + name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add either a label or name that it was kubeadm that did it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, what do you prefer?
*/ | ||
|
||
// TODO: unit tests | ||
package selfhosting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why even have this? ¯_(ツ)_/¯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add unit tests shortly
cmd/kubeadm/app/util/apiclient.go
Outdated
// WaitForPodsWithLabel will lookup pods with the given label and wait until they are all | ||
// reporting status as running. | ||
func WaitForPodsWithLabel(client *clientset.Clientset, name, labelKeyValPair string) { | ||
wait.PollInfinite(kubeadmconstants.APICallRetryInterval, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we poll timeout in case something weird happens on the kubelet? B/c nothing weird ever happens on the kubelet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you prefer? How long should we wait before timing out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a timeout flag in the selfhosted command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be a future improvement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open an issue so we won't forget? Assign it to me if you like and I should get to it soon.
package selfhosting | ||
|
||
import ( | ||
"k8s.io/client-go/pkg/api/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved recently to k8s.io/api/core/v1
e203fbc
to
0e7fbed
Compare
0e7fbed
to
b2f68de
Compare
@timothysc PTAL
This PR might look scary with a lot of changed lines, but in fact the actual code decreases with ~80 LoC. However, the self-hosting code didn't have any comments or unit tests before, but now I've added ~700 LoC of unit tests :). cc @jamiehannaford PTAL |
b2f68de
to
8ad4ebe
Compare
kubeadmconstants "k8s.io/kubernetes/cmd/kubeadm/app/constants" | ||
) | ||
|
||
// MutatePodSpec makes a Static Pod-hosted PodSpec suitable for self-hosting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: lowercase
// 8. In order to avoid race conditions, we're still making sure the API /healthz endpoint is healthy | ||
// 9. Do that for the kube-apiserver, kube-controller-manager and kube-scheduler in a loop | ||
func CreateSelfHostedControlPlane(client *clientset.Clientset) error { | ||
components := []string{kubeAPIServer, kubeControllerManager, kubeScheduler} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment that the sequence matters or that it doesn't.
|
||
// Create the DaemonSet in the API Server | ||
if _, err := client.ExtensionsV1beta1().DaemonSets(metav1.NamespaceSystem).Create(ds); err != nil { | ||
return fmt.Errorf("failed to create self-hosted %q daemonset [%v]", componentName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to tolerate AlreadyExists errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so
} | ||
for _, pod := range apiPods.Items { | ||
fmt.Printf("[apiclient] Pod %s status: %s\n", pod.Name, pod.Status.Phase) | ||
if pod.Status.Phase != v1.PodRunning { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you prefer adding the /healthz check inside WaitForAPI as a readiness probe inside the static pod or by mutating the DaemonSet pod spec and then wait for Ready pods here instead of Running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to disable healthz check on the bootstrap apiserver for self-hosted etcd due to various race conditions, so I think this is the best solution for time being
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall, just a few minor issues
cmd/kubeadm/app/cmd/init.go
Outdated
@@ -132,6 +133,10 @@ func NewCmdInit(out io.Writer) *cobra.Command { | |||
&skipTokenPrint, "skip-token-print", skipTokenPrint, | |||
"Skip printing of the default bootstrap token generated by 'kubeadm init'", | |||
) | |||
cmd.PersistentFlags().BoolVar( | |||
&cfg.SelfHosted, "self-hosted", cfg.SelfHosted, | |||
"If kubeadm should make this control plane self-hosted", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we consider self-hosted experimental? If so, we might want to add something like [experimental]
. We could also log a warning at the top of the init output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we consider self-hosted experimental?
I think so... Or alpha
|
||
// Create the DaemonSet in the API Server | ||
if _, err := client.ExtensionsV1beta1().DaemonSets(metav1.NamespaceSystem).Create(ds); err != nil { | ||
return fmt.Errorf("failed to create self-hosted %q daemonset [%v]", componentName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so
Name: "self-hosted-" + name, | ||
Namespace: metav1.NamespaceSystem, | ||
Labels: map[string]string{ | ||
"k8s-app": selfHostingLabelValuePrefix + name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen in bootkube a tier=control-plane
label, do you think it's useful to add or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, as I don't recognize any practical purpose for it
The less code the better
setMasterTolerationOnPodSpec, | ||
setRightDNSPolicyOnPodSpec, | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For self-hosted etcd we'll also need to modify the podSpec command
(mainly the apiserver). I assume we can pass in the additional flags which either overwrite or append to the existing one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I have coded some methods for modifying .command
but didn't include that in this PR for easier reviewing
} | ||
|
||
// addNodeSelectorToPodSpec makes Pod require to be scheduled on a node marked with the master label | ||
func addNodeSelectorToPodSpec(podSpec *v1.PodSpec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Really like this composable way of mutating the original podSpec
- --kubeconfig=/etc/kubernetes/controller-manager.conf | ||
- --controllers=*,bootstrapsigner,tokencleaner | ||
- --root-ca-file=/etc/kubernetes/pki/ca.crt | ||
- --address=127.0.0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we change this arg to 0.0.0.0
. that will allow us to take off host network. same for scheduler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure why that would allow us to remove hostNetwork
...
As I understand this, the current state is "use the host's network, but don't expose anything widely"
We still need hostNetwork for being able to run things without CNI set up.
livenessProbe: | ||
failureThreshold: 8 | ||
httpGet: | ||
host: 127.0.0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, can we make this an empty string? it defaults to pod IP if left blank. I see no reason to hardcode localhost here, especially if we want to move off host network
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not do that in this PR. It's set in app/phases/controlplane/manifests.go
since a long time back.
This is just testing data, but based on actual output from kubeadm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also ref this statement: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
Here’s one scenario where you would set it. Suppose the Container listens on 127.0.0.1 and the Pod’s hostNetwork field is true. Then host, under httpGet, should be set to 127.0.0.1.
To me it seems like it's required for hostNetwork=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but my point is that the scheduler and controller manager don't need to be on host network. For apiserver (which is on host network), leaving blank will still work because it'll just ping that pod's IP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They must be on the host network due to that CNI is uninitialized, so we don't have any other option here really
} | ||
} | ||
|
||
func createTempFileWithContent(content []byte) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: I don't think unit tests should be modify external state - whether it be filesystem or DB. But due to the current state of our e2e tests maybe it's better to put here instead. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen lots of unit tests that create temp files and then remove them.
A quick search for ioutil.TempFile(
for files kubernetes/*_test.go
yielded 170 results.
It's a pattern I definitely think we can use
kubeControllerManager: { | ||
addNodeSelectorToPodSpec, | ||
setMasterTolerationOnPodSpec, | ||
setRightDNSPolicyOnPodSpec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed bootkube's self-hosted controller manager uses Default
as the DNS policy. @aaronlevy is there any reason for that? Just want to make sure there's no corner cases we're forgetting about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking back at the git blame I believe it had to do with the controller-manager needing to use the dns of the host when cloud-providers were enabled (for example, aws hostnames won't resolve unless you're using the host defined dns server). And at that time DNSClusterFirstWithHostNet
didn't exist -- but seems like a more reasonable choice now (opened a PR to change in bootkube: kubernetes-retired/bootkube#629)
// Wait for the self-hosted component to come up | ||
kubeadmutil.WaitForPodsWithLabel(client, componentName, buildSelfHostedWorkloadLabelQuery(componentName)) | ||
|
||
// Remove the old Static Pod manifest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put all the teardown (L79-87) into a new function and have the init call it after CreateSelfHostedControlPlane? The reason I ask is that it'll need to happen after the etcd-operator deployment and pivot, so doing it now will reduce the future delta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @jamiehannaford on Slack and decided to hold off that for now, and re-order the sequence instead. This is fine as-is now and we'll change it if we change it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address mine and other comments.
poke again when done.
cmd/kubeadm/app/cmd/init.go
Outdated
@@ -132,6 +133,10 @@ func NewCmdInit(out io.Writer) *cobra.Command { | |||
&skipTokenPrint, "skip-token-print", skipTokenPrint, | |||
"Skip printing of the default bootstrap token generated by 'kubeadm init'", | |||
) | |||
cmd.PersistentFlags().BoolVar( | |||
&cfg.SelfHosted, "self-hosted", cfg.SelfHosted, | |||
"If kubeadm should make this control plane self-hosted", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we consider self-hosted experimental?
I think so... Or alpha
// MutatePodSpec makes a Static Pod-hosted PodSpec suitable for self-hosting | ||
func mutatePodSpec(name string, podSpec *v1.PodSpec) { | ||
mutators := map[string][]func(*v1.PodSpec){ | ||
kubeAPIServer: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you merge the secret change? I don't see how you can float without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure what you mean... This works
// 2. Extract the PodSpec from that Static Pod specification | ||
// 3. Mutate the PodSpec to be compatible with self-hosting (add the right labels, taints, etc. so it can schedule correctly) | ||
// 4. Build a new DaemonSet object for the self-hosted component in question. Use the PodSpec above mentioned PodSpec | ||
// 5. Create the DaemonSet resource. Wait for the Pod to become Running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait until the Pods are running ;-)
// 8. In order to avoid race conditions, we're still making sure the API /healthz endpoint is healthy | ||
// 9. Do that for the kube-apiserver, kube-controller-manager and kube-scheduler in a loop | ||
func CreateSelfHostedControlPlane(client *clientset.Clientset) error { | ||
components := []string{kubeAPIServer, kubeControllerManager, kubeScheduler} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to treat them as separate Pods, or containers as part of the same pod?
I'm not certain if we ever discussed a tradeoff here?
The reason I ask is it simplifies the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe folks would want to scale more apiservers than schedulers/CMs, or vice versa? Just a thought
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes a lot of sense to decouple as three different DaemonSets (==3 different Pods) instead of having one DS and one Pod per master.
// Return a DaemonSet based on that Spec | ||
return &extensions.DaemonSet{ | ||
ObjectMeta: metav1.ObjectMeta{ | ||
Name: "self-hosted-" + name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for mutating function with strings pushed to consts.
@timothysc @jamiehannaford @Kargakis PTAL, updated the PR |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
return fmt.Errorf("failed to create self-hosted %q daemonset [%v]", componentName, err) | ||
} | ||
|
||
if _, err := client.ExtensionsV1beta1().DaemonSets(metav1.NamespaceSystem).Update(ds); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may fail on status updates in the DaemonSet so I would make this retriable on 409s. TODO is also fine.
} | ||
|
||
if _, err := client.ExtensionsV1beta1().DaemonSets(metav1.NamespaceSystem).Update(ds); err != nil { | ||
return fmt.Errorf("failed to create self-hosted %q daemonset [%v]", componentName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failed to update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: luxas, timothysc Assign the PR to them by writing Associated issue: 127 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
2b8a504
to
9f1c5a6
Compare
Squashed to two logical comments. Reapplying the labels |
/retest |
Automatic merge from submit-queue (batch tested with PRs 47435, 46044) |
What this PR does / why we need it:
phases/selfhosting
kubeadm alpha phase selfhosting
command that can be invoked against any kubeadm-cluster after install.Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #fixes: kubernetes/kubeadm#127
(large part of at least)
Special notes for your reviewer:
Please only review the fourth commit, based on #47345
Release note:
@kubernetes/sig-cluster-lifecycle-pr-reviews @jbeda