-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm: Enable the Node Authorizer/Admission plugin in v1.7 #46879
kubeadm: Enable the Node Authorizer/Admission plugin in v1.7 #46879
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: luxas The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
cmd/kubeadm/app/cmd/defaults.go
Outdated
@@ -53,6 +53,9 @@ func setInitDynamicDefaults(cfg *kubeadmapi.MasterConfiguration) error { | |||
if k8sVersion.LessThan(kubeadmconstants.MinimumControlPlaneVersion) { | |||
return fmt.Errorf("this version of kubeadm only supports deploying clusters with the control plane version >= %s. Current version: %s", kubeadmconstants.MinimumControlPlaneVersion.String(), cfg.KubernetesVersion) | |||
} | |||
if k8sVersion.AtLeast(kubeadmconstants.MinimumNodeAuthorizerVersion) { | |||
cfg.AuthorizationModes = append(cfg.AuthorizationModes, "Node") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this at the beginning, it's likely to be the fastest one, and we'd want to see the denial messages in the log if it was preventing a node from doing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, will do.
cmd/kubeadm/app/master/manifests.go
Outdated
@@ -43,6 +43,9 @@ import ( | |||
const ( | |||
DefaultCloudConfigPath = "/etc/kubernetes/cloud-config" | |||
|
|||
defaultv16AdmissionControl = "NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota,DefaultTolerationSeconds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ResourceQuota should always be last
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't actually touched this, I guess this came from the "add DefaultTolerationsSeconds everywhere" PR or so. Will fix.
fcf7332
to
b17add3
Compare
@@ -84,6 +85,10 @@ func CreateRBACRules(clientset *clientset.Clientset) error { | |||
if err := createClusterRoleBindings(clientset); err != nil { | |||
return err | |||
} | |||
if err := deletePermissiveNodesBindingWhenUsingNodeAuthorization(clientset); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do this separately from the CreateRBACRules step... logically, it should only happen when both the Node and RBAC authorizers are in place, and doesn't the step that calls CreateRBACRules() know the version of the master it is setting up? I wouldn't expect to need to do discovery here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't made a formal proposal for it, but I think we should always enable the Node Authorizer.
Can you see any argument to not be more secure and enable it if we have the chance?
So by that I could assume the RBAC and Node Authorizer is in place on k8s >= v1.7.
Why I did discovery here is because of the coming phases split-out. Each logical phase in the setup process will be independently invokable in the future. So using the current setup I could just pass the version, but I'd like to see this more as a client against k8s only at this step.
When I make a command of this though, opting out of this deletion will be an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To @luxas's point, I think having this step in CreateRBACRules
is more palatable now that we're making all of these steps idempotent, so it's really CreateOrUpdateRBACRules
or MakeSureAllRBACRulesAreCorrect
. This will likely be externalized to a command like kubeadm phases rbac
, which means "do everything you need to in order to make sure RBAC roles are set up correctly," which involves potentially deleting outdated roles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the target version being deployed should be part of the input. I'd point out that tightening roles is generally something you would do after a deployment completed, because of version skew between components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. It's also worthwhile waiting until we have the phase subcommand fully spec'd out (with interested consumers) before trying to implement toward it.
@@ -58,6 +59,9 @@ func CreateInitKubeConfigFiles(masterEndpoint, pkiDir, outDir string) error { | |||
if err != nil { | |||
return err | |||
} | |||
// Lowercasing here is needed, because Node names in kubernetes are lower-case, and the Node Admission plugin strips "system:node:" from the CN below | |||
// and compares it strictly to the (kubernetes lower-cased) Node name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
point to the nodeutil.GetHostname
kubelet method this is mimicking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also note that this will need to change if cloud providers ever get involved
@@ -58,6 +59,9 @@ func CreateInitKubeConfigFiles(masterEndpoint, pkiDir, outDir string) error { | |||
if err != nil { | |||
return err | |||
} | |||
// Lowercasing here is needed, because Node names in kubernetes are lower-case, and the Node Admission plugin strips "system:node:" from the CN below | |||
// and compares it strictly to the (kubernetes lower-cased) Node name. | |||
hostname := strings.ToLower(hostname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and call this masterNodeName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently, kubeadm skips the cert generation step if the cert already exists. if kubeadm wants to support upgrades, it will now need to read an existing client cert and verify it is valid (not expired and subject matches what it has been asked to generate), and regenerate the cert if needed
b17add3
to
d4638f0
Compare
@liggitt Updated to address your feedback. The tests are expected to fail, but it would be great if I could get a soft LGTM for the second commit and then fixup tests after the rebase... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my reading of #46796, I'd give your second commit a soft LGTM. But of course, more (and updated) tests!
@@ -84,6 +85,10 @@ func CreateRBACRules(clientset *clientset.Clientset) error { | |||
if err := createClusterRoleBindings(clientset); err != nil { | |||
return err | |||
} | |||
if err := deletePermissiveNodesBindingWhenUsingNodeAuthorization(clientset); err != nil { | |||
// This is not a fatal error, proceed as usual but log it. | |||
fmt.Println("[apiconfig] Failed to remove the permissive 'system:node' ClusterRoleBinding: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not a fatal error, I would expect more explanation in the message itself about it being innocuous (unless we want to get a lot of issues from people asking about it). Also, is there really no case where this could fail and we would care? Should it be purely best-effort?
I would imagine that if someone upgrades their kubeadm cluster to 1.7, and sees the new authorizer/admission plugin in place, and reads about the new security features, it should be safe for them to assume they're actually more secure (and not that a random transient failure might have actually caused this less-secure rolebinding to persist during the upgrade). I would understand ignoring NotFound errors where there is nothing to do, but to ignore all errors and succeed the operation feels wrong to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, notfound is ignorable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments.
cmd/kubeadm/app/master/manifests.go
Outdated
@@ -42,6 +43,9 @@ import ( | |||
const ( | |||
DefaultCloudConfigPath = "/etc/kubernetes/cloud-config" | |||
|
|||
defaultv16AdmissionControl = "NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,ResourceQuota" | |||
defaultv17AdmissionControl = "NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we check the version and add NodeRestriction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's basically what this is doing, but since it should be in between DefaultTolerationSeconds and ResourceQuota I felt like this was the most basic and stable way to do it. Didn't want to get too fancy with slices and such things although that would also work
I added /approved label since the pr does have the associated issue. But the pr is not lgtmed before the code freeze. I am leaving the @kubernetes/sig-cluster-lifecycle-misc to decide if pr is for fixing a regression bug or a new feature. If it is for new feature, there should be an exception request. cc/ @kubernetes/kubernetes-release-managers |
d4638f0
to
9b65a05
Compare
9b65a05
to
fccedaf
Compare
/retest |
@dchen1107 this is more an enablement/fix around a feature that was added to this release. Let me know if think we need an exception on this. cc/ @kubernetes/kubernetes-release-managers |
@dchen1107 So this PR was initially a commit in #46796. #46796 was approved/lgtm'd before the code freeze. Then I noticed a potential source to a problem in the commit that enabled the Node Authorizer for kubeadm in PR #46796, so I asked if I could move the enablement of the Node Authorizer for kubeadm to an other PR instead. So I did. That is this PR. It should also be mentioned that the Node Authorizer feature merged one-two days before the code freeze, leaving us (kubeadm folks) with a minimal amount of time to enable it in the code. This change is pretty straightforward. This PR also fixes an other breakage made in the massive code freeze rush. The TL;DR; This is a bug fix for kubeadm AND an enablement of a feature that did make the deadline (the Node Authorizer feature), where the original enablement PR also was lgtm'd before the code freeze. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a minor comment.
cmd/kubeadm/app/cmd/defaults_test.go
Outdated
{[]string{"RBAC"}, "v1.7.0", []string{"Node", "RBAC"}}, | ||
{[]string{"RBAC", "Webhook"}, "v1.7.0", []string{"Node", "RBAC", "Webhook"}}, | ||
{[]string{"RBAC", "Webhook", "Node"}, "v1.7.0", []string{"RBAC", "Webhook", "Node"}}, | ||
{[]string{"Node", "RBAC", "Webhook"}, "v1.7.0", []string{"Node", "RBAC", "Webhook"}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a -beta example test too for completeness.
Re: #46879 (comment) Thanks for the detail information. I am ok with this. @kubernetes/kubernetes-release-managers WDYT? |
…elated to the enablement
fccedaf
to
b7700ef
Compare
Thank you @dchen1107! @timothysc Updated the unit test to test beta versions as well I'm feature-gating the Node Authorizer and Initializer admission controller behind beta.1, since some required parts of that work was done after beta.0. So after the beta.1 cut, this PR may be merged and the kubeadm e2e CI will turn green again @pipejakob :) |
@timothysc I was thinking we could start with adding the two lines in the PR description as tests in kubernetes-anywhere. It would be super cool if there were ready-made e2e's as well (which there probably is). In case there is, we should match them in the kubeadm job for sure |
When the confirmance suite is run with this configuration on, every time a node runs a pod and reports pod status, this authorizer and admission plugin are exercised. Negative verification exists as an integration test only today - https://github.com/kubernetes/kubernetes/blob/master/test/integration/auth/node_test.go |
What is the plan for defaulting on? |
Up to each deployment method. gce/gke is enabling in #46796 with this PR, the conformance e2e suite run against a kubeadm-setup cluster would exercise the authorizer/admission |
don't the bugfixes need to be made now, separately, in order for beta.1 to cut if we want the kubeadm CI job to be release blocking? |
@liggitt No, actually not required. The kubeadm job builds kubeadm from master but uses control plane images from latest release. So once beta.1 is cut with Initializers support and this PR is merged it should go green. I'm open to merging this as-is now though if we want to have this in beta.1, either way works |
Automatic merge from submit-queue (batch tested with PRs 47024, 47050, 47086, 47081, 47013) kubeadm: Make the creation of the RBAC rules phase idempotent **What this PR does / why we need it**: Bugfix: Currently kubeadm fails with a non-zero code if resources it's trying to create already exist. This PR fixes that by making kubeadm try to Update resources that already exist. After this PR, #46879 and a beta.1 release, kubeadm will be fully upgradeable from v1.6 to v1.7 using only kubeadm init. Last piece of kubernetes/kubeadm#288 **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # Fixes: kubernetes/kubeadm#288 **Special notes for your reviewer**: **Release note**: ```release-note kubeadm: Modifications to cluster-internal resources installed by kubeadm will be overwritten when upgrading from v1.6 to v1.7. ``` @pipejakob @mikedanese @timothysc
Removing do-not-merge as beta.1 was cut 👍 @k8s-bot pull-kubernetes-e2e-kubeadm-gce test this /retest |
@luxas: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest |
Automatic merge from submit-queue |
What this PR does / why we need it:
This is similar to #46796, but for kubeadm.
Basically it was a part of #46796, but there were some other upgradability and compability concerns for kubeadm I took care of while working today.
Example:
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Special notes for your reviewer:
Depends on #46864 (uses that PR as a base, will rebase once it's merged)
Please only review the second commit. Will also fix tests in a minute.
Release note:
@mikedanese @liggitt @pipejakob @roberthbailey @jbeda @timothysc