Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: Make self-hosting work and split it out to a phase #47435

Merged
merged 2 commits into from
Jul 6, 2017

Conversation

luxas
Copy link
Member

@luxas luxas commented Jun 13, 2017

What this PR does / why we need it:

  • Removes the old self-hosting code
  • Puts the new self-hosting code in phases/selfhosting
    • The new code reads manifests from disk (static pods)...
    • ...mutates the PodSpec as necessary...
    • ...and posts the DaemonSet to the API Server...
    • ...and waits for it to come up
  • Uses DaemonSets for all control plane components
  • Creates a kubeadm alpha phase selfhosting command that can be invoked against any kubeadm-cluster after install.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

fixes: kubernetes/kubeadm#127
(large part of at least)

Special notes for your reviewer:

Please only review the fourth commit, based on #47345

Release note:

kubeadm: Make self-hosting work by using DaemonSets and split it out to a phase that can be invoked via the CLI

@kubernetes/sig-cluster-lifecycle-pr-reviews @jbeda

@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 13, 2017
@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jun 13, 2017
@timothysc timothysc added this to the v1.8 milestone Jun 14, 2017
@timothysc timothysc added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Jun 14, 2017
@timothysc
Copy link
Member

So this stuff needs to be merged in order, so marking this one as do-not-merge till the others are ready.

kubeconfigutil "k8s.io/kubernetes/cmd/kubeadm/app/util/kubeconfig"
)

func NewCmdSelfhosting() *cobra.Command {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should follow the conventions of // godoc on any public facing f(n)s.

kubeadmconstants "k8s.io/kubernetes/cmd/kubeadm/app/constants"
)

func MutatePodSpec(name string, podSpec *v1.PodSpec) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same.

}
}

func addFileLockToPodSpec(podSpec *v1.PodSpec) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely clear what you are doing here and why?

// Wait for the self-hosted component to come up
kubeadmutil.WaitForPodsWithLabel(client, component, fmt.Sprintf("k8s-app=self-hosted-%s", component))

manifestPath := buildStaticManifestFilepath(component)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment this, it's totally unclear to anyone not working in this area what you are doing, and reference your phases.


manifestPath := buildStaticManifestFilepath(component)
if err := os.RemoveAll(manifestPath); err != nil {
return fmt.Errorf("unable to delete static pod manifest for %s [%v]", component, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we now potentially leave it in a 1/2 deployed state? and future creates could fail?

}

func buildDaemonSet(name string) (*extensions.DaemonSet, error) {
podSpec, err := loadPodSpecFromDisk(name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again comment on why you are loading from the original static manifest.


return &extensions.DaemonSet{
ObjectMeta: metav1.ObjectMeta{
Name: "self-hosted-" + name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add either a label or name that it was kubeadm that did it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, what do you prefer?

*/

// TODO: unit tests
package selfhosting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why even have this? ¯_(ツ)_/¯

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add unit tests shortly

// WaitForPodsWithLabel will lookup pods with the given label and wait until they are all
// reporting status as running.
func WaitForPodsWithLabel(client *clientset.Clientset, name, labelKeyValPair string) {
wait.PollInfinite(kubeadmconstants.APICallRetryInterval, func() (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we poll timeout in case something weird happens on the kubelet? B/c nothing weird ever happens on the kubelet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you prefer? How long should we wait before timing out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a timeout flag in the selfhosted command?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be a future improvement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open an issue so we won't forget? Assign it to me if you like and I should get to it soon.

@luxas luxas changed the title kubeadm: Make self-hosting work and split it out to a phase WIP: kubeadm: Make self-hosting work and split it out to a phase Jun 14, 2017
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 20, 2017
package selfhosting

import (
"k8s.io/client-go/pkg/api/v1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved recently to k8s.io/api/core/v1

@luxas luxas force-pushed the kubeadm_new_selfhosting branch from e203fbc to 0e7fbed Compare July 4, 2017 16:28
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 4, 2017
@luxas luxas changed the title WIP: kubeadm: Make self-hosting work and split it out to a phase kubeadm: Make self-hosting work and split it out to a phase Jul 4, 2017
@luxas luxas force-pushed the kubeadm_new_selfhosting branch from 0e7fbed to b2f68de Compare July 4, 2017 19:31
@luxas luxas removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Jul 4, 2017
@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 4, 2017
@luxas
Copy link
Member Author

luxas commented Jul 4, 2017

@timothysc PTAL

  • Rebased
  • Improved the code
  • Commented all functions
  • Added lots of unit tests (in their own commit for easier reviewing)

This PR might look scary with a lot of changed lines, but in fact the actual code decreases with ~80 LoC.

However, the self-hosting code didn't have any comments or unit tests before, but now I've added ~700 LoC of unit tests :).
It's also possible to cleanup more code after this PRs is merged, but saving those cleanup tasks to a future PR

cc @jamiehannaford PTAL

@luxas luxas force-pushed the kubeadm_new_selfhosting branch from b2f68de to 8ad4ebe Compare July 4, 2017 20:14
@k8s-github-robot k8s-github-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 4, 2017
kubeadmconstants "k8s.io/kubernetes/cmd/kubeadm/app/constants"
)

// MutatePodSpec makes a Static Pod-hosted PodSpec suitable for self-hosting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: lowercase

// 8. In order to avoid race conditions, we're still making sure the API /healthz endpoint is healthy
// 9. Do that for the kube-apiserver, kube-controller-manager and kube-scheduler in a loop
func CreateSelfHostedControlPlane(client *clientset.Clientset) error {
components := []string{kubeAPIServer, kubeControllerManager, kubeScheduler}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment that the sequence matters or that it doesn't.


// Create the DaemonSet in the API Server
if _, err := client.ExtensionsV1beta1().DaemonSets(metav1.NamespaceSystem).Create(ds); err != nil {
return fmt.Errorf("failed to create self-hosted %q daemonset [%v]", componentName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to tolerate AlreadyExists errors?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so

}
for _, pod := range apiPods.Items {
fmt.Printf("[apiclient] Pod %s status: %s\n", pod.Name, pod.Status.Phase)
if pod.Status.Phase != v1.PodRunning {
Copy link
Contributor

@0xmichalis 0xmichalis Jul 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you prefer adding the /healthz check inside WaitForAPI as a readiness probe inside the static pod or by mutating the DaemonSet pod spec and then wait for Ready pods here instead of Running?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to disable healthz check on the bootstrap apiserver for self-hosted etcd due to various race conditions, so I think this is the best solution for time being

Copy link
Contributor

@jamiehannaford jamiehannaford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall, just a few minor issues

@@ -132,6 +133,10 @@ func NewCmdInit(out io.Writer) *cobra.Command {
&skipTokenPrint, "skip-token-print", skipTokenPrint,
"Skip printing of the default bootstrap token generated by 'kubeadm init'",
)
cmd.PersistentFlags().BoolVar(
&cfg.SelfHosted, "self-hosted", cfg.SelfHosted,
"If kubeadm should make this control plane self-hosted",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we consider self-hosted experimental? If so, we might want to add something like [experimental]. We could also log a warning at the top of the init output

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we consider self-hosted experimental?

I think so... Or alpha


// Create the DaemonSet in the API Server
if _, err := client.ExtensionsV1beta1().DaemonSets(metav1.NamespaceSystem).Create(ds); err != nil {
return fmt.Errorf("failed to create self-hosted %q daemonset [%v]", componentName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so

Name: "self-hosted-" + name,
Namespace: metav1.NamespaceSystem,
Labels: map[string]string{
"k8s-app": selfHostingLabelValuePrefix + name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen in bootkube a tier=control-plane label, do you think it's useful to add or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, as I don't recognize any practical purpose for it
The less code the better

setMasterTolerationOnPodSpec,
setRightDNSPolicyOnPodSpec,
},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For self-hosted etcd we'll also need to modify the podSpec command (mainly the apiserver). I assume we can pass in the additional flags which either overwrite or append to the existing one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I have coded some methods for modifying .command but didn't include that in this PR for easier reviewing

}

// addNodeSelectorToPodSpec makes Pod require to be scheduled on a node marked with the master label
func addNodeSelectorToPodSpec(podSpec *v1.PodSpec) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Really like this composable way of mutating the original podSpec

- --kubeconfig=/etc/kubernetes/controller-manager.conf
- --controllers=*,bootstrapsigner,tokencleaner
- --root-ca-file=/etc/kubernetes/pki/ca.crt
- --address=127.0.0.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change this arg to 0.0.0.0. that will allow us to take off host network. same for scheduler

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure why that would allow us to remove hostNetwork...
As I understand this, the current state is "use the host's network, but don't expose anything widely"
We still need hostNetwork for being able to run things without CNI set up.

livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, can we make this an empty string? it defaults to pod IP if left blank. I see no reason to hardcode localhost here, especially if we want to move off host network

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not do that in this PR. It's set in app/phases/controlplane/manifests.go since a long time back.

This is just testing data, but based on actual output from kubeadm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also ref this statement: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/

Here’s one scenario where you would set it. Suppose the Container listens on 127.0.0.1 and the Pod’s hostNetwork field is true. Then host, under httpGet, should be set to 127.0.0.1.

To me it seems like it's required for hostNetwork=true

Copy link
Contributor

@jamiehannaford jamiehannaford Jul 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but my point is that the scheduler and controller manager don't need to be on host network. For apiserver (which is on host network), leaving blank will still work because it'll just ping that pod's IP

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They must be on the host network due to that CNI is uninitialized, so we don't have any other option here really

}
}

func createTempFileWithContent(content []byte) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: I don't think unit tests should be modify external state - whether it be filesystem or DB. But due to the current state of our e2e tests maybe it's better to put here instead. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen lots of unit tests that create temp files and then remove them.
A quick search for ioutil.TempFile( for files kubernetes/*_test.go yielded 170 results.
It's a pattern I definitely think we can use

kubeControllerManager: {
addNodeSelectorToPodSpec,
setMasterTolerationOnPodSpec,
setRightDNSPolicyOnPodSpec,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed bootkube's self-hosted controller manager uses Default as the DNS policy. @aaronlevy is there any reason for that? Just want to make sure there's no corner cases we're forgetting about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back at the git blame I believe it had to do with the controller-manager needing to use the dns of the host when cloud-providers were enabled (for example, aws hostnames won't resolve unless you're using the host defined dns server). And at that time DNSClusterFirstWithHostNet didn't exist -- but seems like a more reasonable choice now (opened a PR to change in bootkube: kubernetes-retired/bootkube#629)

// Wait for the self-hosted component to come up
kubeadmutil.WaitForPodsWithLabel(client, componentName, buildSelfHostedWorkloadLabelQuery(componentName))

// Remove the old Static Pod manifest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put all the teardown (L79-87) into a new function and have the init call it after CreateSelfHostedControlPlane? The reason I ask is that it'll need to happen after the etcd-operator deployment and pivot, so doing it now will reduce the future delta

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @jamiehannaford on Slack and decided to hold off that for now, and re-order the sequence instead. This is fine as-is now and we'll change it if we change it in the future.

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address mine and other comments.

poke again when done.

@@ -132,6 +133,10 @@ func NewCmdInit(out io.Writer) *cobra.Command {
&skipTokenPrint, "skip-token-print", skipTokenPrint,
"Skip printing of the default bootstrap token generated by 'kubeadm init'",
)
cmd.PersistentFlags().BoolVar(
&cfg.SelfHosted, "self-hosted", cfg.SelfHosted,
"If kubeadm should make this control plane self-hosted",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we consider self-hosted experimental?

I think so... Or alpha

// MutatePodSpec makes a Static Pod-hosted PodSpec suitable for self-hosting
func mutatePodSpec(name string, podSpec *v1.PodSpec) {
mutators := map[string][]func(*v1.PodSpec){
kubeAPIServer: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you merge the secret change? I don't see how you can float without it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure what you mean... This works

// 2. Extract the PodSpec from that Static Pod specification
// 3. Mutate the PodSpec to be compatible with self-hosting (add the right labels, taints, etc. so it can schedule correctly)
// 4. Build a new DaemonSet object for the self-hosted component in question. Use the PodSpec above mentioned PodSpec
// 5. Create the DaemonSet resource. Wait for the Pod to become Running.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait until the Pods are running ;-)

// 8. In order to avoid race conditions, we're still making sure the API /healthz endpoint is healthy
// 9. Do that for the kube-apiserver, kube-controller-manager and kube-scheduler in a loop
func CreateSelfHostedControlPlane(client *clientset.Clientset) error {
components := []string{kubeAPIServer, kubeControllerManager, kubeScheduler}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to treat them as separate Pods, or containers as part of the same pod?

I'm not certain if we ever discussed a tradeoff here?

The reason I ask is it simplifies the model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe folks would want to scale more apiservers than schedulers/CMs, or vice versa? Just a thought

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes a lot of sense to decouple as three different DaemonSets (==3 different Pods) instead of having one DS and one Pod per master.

// Return a DaemonSet based on that Spec
return &extensions.DaemonSet{
ObjectMeta: metav1.ObjectMeta{
Name: "self-hosted-" + name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for mutating function with strings pushed to consts.

@luxas
Copy link
Member Author

luxas commented Jul 5, 2017

@timothysc @jamiehannaford @Kargakis PTAL, updated the PR

@luxas
Copy link
Member Author

luxas commented Jul 5, 2017

/retest

Copy link
Contributor

@jamiehannaford jamiehannaford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

return fmt.Errorf("failed to create self-hosted %q daemonset [%v]", componentName, err)
}

if _, err := client.ExtensionsV1beta1().DaemonSets(metav1.NamespaceSystem).Update(ds); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may fail on status updates in the DaemonSet so I would make this retriable on 409s. TODO is also fine.

}

if _, err := client.ExtensionsV1beta1().DaemonSets(metav1.NamespaceSystem).Update(ds); err != nil {
return fmt.Errorf("failed to create self-hosted %q daemonset [%v]", componentName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failed to update

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 6, 2017
Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: luxas, timothysc
We suggest the following additional approver: jbeda

Assign the PR to them by writing /assign @jbeda in a comment when ready.

Associated issue: 127

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@luxas luxas force-pushed the kubeadm_new_selfhosting branch from 2b8a504 to 9f1c5a6 Compare July 6, 2017 17:55
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 6, 2017
@luxas
Copy link
Member Author

luxas commented Jul 6, 2017

Squashed to two logical comments. Reapplying the labels

@luxas luxas added approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jul 6, 2017
@luxas
Copy link
Member Author

luxas commented Jul 6, 2017

/retest

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 47435, 46044)

@k8s-github-robot k8s-github-robot merged commit b00df7e into kubernetes:master Jul 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Follow-up selfhosting improvements
9 participants