Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: Enable the Node Authorizer/Admission plugin in v1.7 #46879

Merged

Conversation

luxas
Copy link
Member

@luxas luxas commented Jun 2, 2017

What this PR does / why we need it:

This is similar to #46796, but for kubeadm.
Basically it was a part of #46796, but there were some other upgradability and compability concerns for kubeadm I took care of while working today.

Example:

$ kubeadm init --kubernetes-version v1.7.0-beta.0
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.7.0-beta.0
[init] Using Authorization mode: [RBAC Node]
...
$ sudo kubectl --kubeconfig=/etc/kubernetes/kubelet.conf get secret foo
Error from server (Forbidden): User "system:node:thegopher" cannot get secrets in the namespace "default".: "no path found to object" (get secrets foo)

$ echo '{"apiVersion":"v1","kind":"Node","metadata":{"name":"foo"}}' | sudo kubectl create -f - --kubeconfig=/etc/kubernetes/kubelet.conf
Error from server (Forbidden): error when creating "STDIN": nodes "foo" is forbidden: node thegopher cannot modify node foo

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

Depends on #46864 (uses that PR as a base, will rebase once it's merged)

Please only review the second commit. Will also fix tests in a minute.

Release note:

kubeadm: Enable the Node Authorizer/Admission plugin in v1.7 

@mikedanese @liggitt @pipejakob @roberthbailey @jbeda @timothysc

@luxas luxas added this to the v1.7 milestone Jun 2, 2017
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 2, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: luxas

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jun 2, 2017
@@ -53,6 +53,9 @@ func setInitDynamicDefaults(cfg *kubeadmapi.MasterConfiguration) error {
if k8sVersion.LessThan(kubeadmconstants.MinimumControlPlaneVersion) {
return fmt.Errorf("this version of kubeadm only supports deploying clusters with the control plane version >= %s. Current version: %s", kubeadmconstants.MinimumControlPlaneVersion.String(), cfg.KubernetesVersion)
}
if k8sVersion.AtLeast(kubeadmconstants.MinimumNodeAuthorizerVersion) {
cfg.AuthorizationModes = append(cfg.AuthorizationModes, "Node")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this at the beginning, it's likely to be the fastest one, and we'd want to see the denial messages in the log if it was preventing a node from doing something

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, will do.

@@ -43,6 +43,9 @@ import (
const (
DefaultCloudConfigPath = "/etc/kubernetes/cloud-config"

defaultv16AdmissionControl = "NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota,DefaultTolerationSeconds"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResourceQuota should always be last

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't actually touched this, I guess this came from the "add DefaultTolerationsSeconds everywhere" PR or so. Will fix.

@luxas luxas force-pushed the kubeadm_enable_node_authorizer branch from fcf7332 to b17add3 Compare June 2, 2017 19:45
@@ -84,6 +85,10 @@ func CreateRBACRules(clientset *clientset.Clientset) error {
if err := createClusterRoleBindings(clientset); err != nil {
return err
}
if err := deletePermissiveNodesBindingWhenUsingNodeAuthorization(clientset); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do this separately from the CreateRBACRules step... logically, it should only happen when both the Node and RBAC authorizers are in place, and doesn't the step that calls CreateRBACRules() know the version of the master it is setting up? I wouldn't expect to need to do discovery here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't made a formal proposal for it, but I think we should always enable the Node Authorizer.
Can you see any argument to not be more secure and enable it if we have the chance?

So by that I could assume the RBAC and Node Authorizer is in place on k8s >= v1.7.

Why I did discovery here is because of the coming phases split-out. Each logical phase in the setup process will be independently invokable in the future. So using the current setup I could just pass the version, but I'd like to see this more as a client against k8s only at this step.

When I make a command of this though, opting out of this deletion will be an option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @luxas's point, I think having this step in CreateRBACRules is more palatable now that we're making all of these steps idempotent, so it's really CreateOrUpdateRBACRules or MakeSureAllRBACRulesAreCorrect. This will likely be externalized to a command like kubeadm phases rbac, which means "do everything you need to in order to make sure RBAC roles are set up correctly," which involves potentially deleting outdated roles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the target version being deployed should be part of the input. I'd point out that tightening roles is generally something you would do after a deployment completed, because of version skew between components

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. It's also worthwhile waiting until we have the phase subcommand fully spec'd out (with interested consumers) before trying to implement toward it.

@@ -58,6 +59,9 @@ func CreateInitKubeConfigFiles(masterEndpoint, pkiDir, outDir string) error {
if err != nil {
return err
}
// Lowercasing here is needed, because Node names in kubernetes are lower-case, and the Node Admission plugin strips "system:node:" from the CN below
// and compares it strictly to the (kubernetes lower-cased) Node name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

point to the nodeutil.GetHostname kubelet method this is mimicking

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also note that this will need to change if cloud providers ever get involved

@@ -58,6 +59,9 @@ func CreateInitKubeConfigFiles(masterEndpoint, pkiDir, outDir string) error {
if err != nil {
return err
}
// Lowercasing here is needed, because Node names in kubernetes are lower-case, and the Node Admission plugin strips "system:node:" from the CN below
// and compares it strictly to the (kubernetes lower-cased) Node name.
hostname := strings.ToLower(hostname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and call this masterNodeName

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, kubeadm skips the cert generation step if the cert already exists. if kubeadm wants to support upgrades, it will now need to read an existing client cert and verify it is valid (not expired and subject matches what it has been asked to generate), and regenerate the cert if needed

@luxas luxas force-pushed the kubeadm_enable_node_authorizer branch from b17add3 to d4638f0 Compare June 2, 2017 20:19
@luxas
Copy link
Member Author

luxas commented Jun 2, 2017

@liggitt Updated to address your feedback. The tests are expected to fail, but it would be great if I could get a soft LGTM for the second commit and then fixup tests after the rebase...

Copy link
Contributor

@pipejakob pipejakob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my reading of #46796, I'd give your second commit a soft LGTM. But of course, more (and updated) tests!

@@ -84,6 +85,10 @@ func CreateRBACRules(clientset *clientset.Clientset) error {
if err := createClusterRoleBindings(clientset); err != nil {
return err
}
if err := deletePermissiveNodesBindingWhenUsingNodeAuthorization(clientset); err != nil {
// This is not a fatal error, proceed as usual but log it.
fmt.Println("[apiconfig] Failed to remove the permissive 'system:node' ClusterRoleBinding: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not a fatal error, I would expect more explanation in the message itself about it being innocuous (unless we want to get a lot of issues from people asking about it). Also, is there really no case where this could fail and we would care? Should it be purely best-effort?

I would imagine that if someone upgrades their kubeadm cluster to 1.7, and sees the new authorizer/admission plugin in place, and reads about the new security features, it should be safe for them to assume they're actually more secure (and not that a random transient failure might have actually caused this less-secure rolebinding to persist during the upgrade). I would understand ignoring NotFound errors where there is nothing to do, but to ignore all errors and succeed the operation feels wrong to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, notfound is ignorable

@liggitt liggitt mentioned this pull request Jun 4, 2017
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 5, 2017
Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments.

@@ -42,6 +43,9 @@ import (
const (
DefaultCloudConfigPath = "/etc/kubernetes/cloud-config"

defaultv16AdmissionControl = "NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,ResourceQuota"
defaultv17AdmissionControl = "NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we check the version and add NodeRestriction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's basically what this is doing, but since it should be in between DefaultTolerationSeconds and ResourceQuota I felt like this was the most basic and stable way to do it. Didn't want to get too fancy with slices and such things although that would also work

@k8s-github-robot k8s-github-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 5, 2017
@dchen1107
Copy link
Member

I added /approved label since the pr does have the associated issue. But the pr is not lgtmed before the code freeze. I am leaving the @kubernetes/sig-cluster-lifecycle-misc to decide if pr is for fixing a regression bug or a new feature. If it is for new feature, there should be an exception request.

cc/ @kubernetes/kubernetes-release-managers

@k8s-ci-robot k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Jun 6, 2017
@dchen1107 dchen1107 added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 6, 2017
@luxas luxas force-pushed the kubeadm_enable_node_authorizer branch from d4638f0 to 9b65a05 Compare June 6, 2017 06:32
@k8s-github-robot k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 6, 2017
@luxas luxas force-pushed the kubeadm_enable_node_authorizer branch from 9b65a05 to fccedaf Compare June 6, 2017 17:30
@luxas
Copy link
Member Author

luxas commented Jun 6, 2017

Rebased upon #46864, so ready to be merged BUT this will have to wait until #47021 is in.

@liggitt Updated to address your feedback, PTAL

@k8s-bot pull-kubernetes-e2e-kubeadm-gce test this

I think we'll have a green run now... unless we run into quota issues for the job

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 6, 2017
@luxas
Copy link
Member Author

luxas commented Jun 6, 2017

/retest

@timothysc timothysc added the kind/bug Categorizes issue or PR as related to a bug. label Jun 6, 2017
@timothysc
Copy link
Member

@dchen1107 this is more an enablement/fix around a feature that was added to this release. Let me know if think we need an exception on this.

cc/ @kubernetes/kubernetes-release-managers

@luxas
Copy link
Member Author

luxas commented Jun 6, 2017

@dchen1107 So this PR was initially a commit in #46796. #46796 was approved/lgtm'd before the code freeze. Then I noticed a potential source to a problem in the commit that enabled the Node Authorizer for kubeadm in PR #46796, so I asked if I could move the enablement of the Node Authorizer for kubeadm to an other PR instead. So I did. That is this PR.

It should also be mentioned that the Node Authorizer feature merged one-two days before the code freeze, leaving us (kubeadm folks) with a minimal amount of time to enable it in the code. This change is pretty straightforward.

This PR also fixes an other breakage made in the massive code freeze rush. The Initializers admission controller was (blindly) added in an automated fashon across the codebase. However, that makes it impossible to deploy v1.6 clusters with kubeadm if it insists to always set the Initializers admission controller (which didn't exist in v1.6).

TL;DR; This is a bug fix for kubeadm AND an enablement of a feature that did make the deadline (the Node Authorizer feature), where the original enablement PR also was lgtm'd before the code freeze.

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a minor comment.

{[]string{"RBAC"}, "v1.7.0", []string{"Node", "RBAC"}},
{[]string{"RBAC", "Webhook"}, "v1.7.0", []string{"Node", "RBAC", "Webhook"}},
{[]string{"RBAC", "Webhook", "Node"}, "v1.7.0", []string{"RBAC", "Webhook", "Node"}},
{[]string{"Node", "RBAC", "Webhook"}, "v1.7.0", []string{"Node", "RBAC", "Webhook"}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a -beta example test too for completeness.

@dchen1107
Copy link
Member

Re: #46879 (comment)

Thanks for the detail information. I am ok with this. @kubernetes/kubernetes-release-managers WDYT?

@luxas luxas force-pushed the kubeadm_enable_node_authorizer branch from fccedaf to b7700ef Compare June 7, 2017 18:08
@luxas
Copy link
Member Author

luxas commented Jun 7, 2017

Thank you @dchen1107!

@timothysc Updated the unit test to test beta versions as well

I'm feature-gating the Node Authorizer and Initializer admission controller behind beta.1, since some required parts of that work was done after beta.0.

So after the beta.1 cut, this PR may be merged and the kubeadm e2e CI will turn green again @pipejakob :)

@luxas luxas added do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 7, 2017
@timothysc
Copy link
Member

@luxas @liggitt Are there any e2e's that validate this feature?

@luxas
Copy link
Member Author

luxas commented Jun 7, 2017

@timothysc I was thinking we could start with adding the two lines in the PR description as tests in kubernetes-anywhere. It would be super cool if there were ready-made e2e's as well (which there probably is). In case there is, we should match them in the kubeadm job for sure

@liggitt
Copy link
Member

liggitt commented Jun 7, 2017

When the confirmance suite is run with this configuration on, every time a node runs a pod and reports pod status, this authorizer and admission plugin are exercised.

Negative verification exists as an integration test only today - https://github.com/kubernetes/kubernetes/blob/master/test/integration/auth/node_test.go

@timothysc
Copy link
Member

"with this configuration on"

What is the plan for defaulting on?

@liggitt
Copy link
Member

liggitt commented Jun 7, 2017

What is the plan for defaulting on?

Up to each deployment method. gce/gke is enabling in #46796

with this PR, the conformance e2e suite run against a kubeadm-setup cluster would exercise the authorizer/admission

@liggitt
Copy link
Member

liggitt commented Jun 7, 2017

don't the bugfixes need to be made now, separately, in order for beta.1 to cut if we want the kubeadm CI job to be release blocking?

@luxas
Copy link
Member Author

luxas commented Jun 7, 2017

@liggitt No, actually not required. The kubeadm job builds kubeadm from master but uses control plane images from latest release. So once beta.1 is cut with Initializers support and this PR is merged it should go green.

I'm open to merging this as-is now though if we want to have this in beta.1, either way works

k8s-github-robot pushed a commit that referenced this pull request Jun 7, 2017
Automatic merge from submit-queue (batch tested with PRs 47024, 47050, 47086, 47081, 47013)

kubeadm: Make the creation of the RBAC rules phase idempotent

**What this PR does / why we need it**:

Bugfix: Currently kubeadm fails with a non-zero code if resources it's trying to create already exist. This PR fixes that by making kubeadm try to Update resources that already exist.

After this PR, #46879 and a beta.1 release, kubeadm will be fully upgradeable from v1.6 to v1.7 using only kubeadm init.

Last piece of kubernetes/kubeadm#288

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

Fixes: kubernetes/kubeadm#288

**Special notes for your reviewer**:

**Release note**:

```release-note
kubeadm: Modifications to cluster-internal resources installed by kubeadm will be overwritten when upgrading from v1.6 to v1.7.
```
@pipejakob @mikedanese @timothysc
@luxas
Copy link
Member Author

luxas commented Jun 8, 2017

Removing do-not-merge as beta.1 was cut 👍

@k8s-bot pull-kubernetes-e2e-kubeadm-gce test this

/retest

@luxas luxas removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Jun 8, 2017
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jun 8, 2017

@luxas: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-federation-e2e-gce d4638f0b1e86dc6d9c7b98bdf6c1c874ea8c2a84 link @k8s-bot pull-kubernetes-federation-e2e-gce test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@luxas
Copy link
Member Author

luxas commented Jun 8, 2017

/retest

@k8s-github-robot
Copy link

Automatic merge from submit-queue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants