-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow running kube-proxy as a DaemonSet when using kube-up.sh on GCE #50705
Conversation
cluster/gce/gci/configure-helper.sh
Outdated
@@ -25,6 +25,8 @@ set -o errexit | |||
set -o nounset | |||
set -o pipefail | |||
|
|||
LOCAL_NODE_LABELS="${NODE_LABELS:-}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting kube-proxy-ds-ready
label only on nodes so that kube-proxy won't be scheduled on master. It is due to the fact that daemonset controller does not respect node.spec.unschedulable
field and we don't set taint on master by default.
I'm a bit concerned about defining this LOCAL_NODE_LABELS env as it differs from NODE_LABELS set in kube-env and may introduce inconsistency and confusion.
Another possible option is set NoSchedule taint on master (as defined by #39112). From #43537 it seems that requires a comprehensive plan but don't have a clear timeline and owner yet. So may not want to block this on that.
# Please keep kube-proxy configuration in-sync with: | ||
# cluster/saltbase/salt/kube-proxy/kube-proxy.manifest | ||
|
||
apiVersion: extensions/v1beta1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to cc sig-node as well. @kubernetes/sig-node-pr-reviews |
kube-proxy as a DaemonSet would only be an alpha feature in 1.8. The rollback mechanism suggested by @thockin #50705 (comment) seems sufficient to me.
Would having this as an alpha feature help test and stabilize pod priority and scheduler preemption? |
First, no one told me that is an alpha feature. According to @MrHohn, in his original proposal, this should be alpha feature, but decided not later. Using the new logic introduced by pod priority feature is gated by an alpha. Also even we plan to make this alpha here, it will be very error-prone. Asking @bowei @gmarek how painful & messy to introduce an alpha feature gate through kube-up scripts. To achieve @thockin's suggest, @MrHohn has to add extra complexity to his logic on already-messy-kube-up. He has to introduce two vars:
If sig-scalability and sig-scheduling have the confidence on the rescheduler (#50705 (comment)), and we have low chance to run into case 3), yes, let's take the risk to do this. I want to get rid of those static pods from worker nodes long time ago. But if no such confidence, why in a rush? We don't have vertical scaling yet; we don't have GKE deployment & addon manager can integrate that feature yet. No gain in a short term, instead of risk. |
Sure, this sounds good to me from the scheduling point of view. You should make sure that sig-network is fine with this. |
I believe this was intended to be an Alpha feature, but I could be wrong (of course). I don't doubt that it's a pain to change the kube-up scripts, but isn't this necessary unless we are so confident that we can ship this as a beta/GA feature directly? If that's the preferred route (i.e., direct to beta), I agree that it's premature to introduce this feature in 1.8.
Even if this goes directly to Beta, we still need to have a rollback mechanism. I don't see how we can avoid that.... Regardless whether this makes 1.8 or not, I think it might be good to add the code (as a pre-alpha experimental feature?) and start a test cluster to run kube-proxy as a DaemonSet with pod priority and scheduler preemption. |
If this is going directly to beta, we shouldn't include a configuration option to turn it off. |
cluster/gce/gci/configure-helper.sh
Outdated
fi | ||
if [[ -n "${node_labels:-}" ]]; then | ||
flags+=" --node-labels=${node_labels}" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this in start-kubelet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, fixed now.
Ok, quickly chat with @MrHohn and @bowei, so that we can move forward. We agreed upon that there is no way to make this an alpha feature in OSS k8s. But we could only enable this feature on GKE's alpha clusters if we want to. This particular change along with other pending changes including the pod priority and preemption, the vertical scaling, etc. can give us a much stable and easy managing kube-proxy and cluster in a long run, but not in 1.8 for sure. The current state of this change doesn't bring any value to the users directly at this moment, except to introduce more complexity to GKE alpha clusters. At the end, all of us agreed that there is no value to enable this for GKE alpha clusters in 1.8 since there is no opt-in option for the user to choose which alpha features to run. But enabling the test for the future movement is a good idea. We decided to disable KUBE_PROXY_DAEMONSET for test by default, instead, creating a separate e2e job to enable this for testing. |
277bdf2
to
f4b5cdc
Compare
Test passed. PR ready for another round. |
LGTM, please squash fixup commits. |
@mikedanese Thanks, squashed commits and made |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MrHohn, mikedanese Associated issue: 23225 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest Review the full test history for this PR. |
1 similar comment
/retest Review the full test history for this PR. |
Automatic merge from submit-queue (batch tested with PRs 50932, 49610, 51312, 51415, 50705) |
…-tests Automatic merge from submit-queue (batch tested with PRs 51228, 50185, 50940, 51544, 51543) Add upgrades tests for kube-proxy daemonset migration path **What this PR does / why we need it**: From kubernetes#23225, this is a part of setting up CIs to validate the kube-proxy migration path (static pods -> daemonset and reverse). The other part of the works (adding real CIs that run these tests) will be in a separate PR against [kubernetes/test-infra](https://github.com/kubernetes/test-infra). Though this is currently blocked by kubernetes#50705. **Special notes for your reviewer**: cc @roberthbailey @pwittrock **Release note**: ```release-note NONE ```
Automatic merge from submit-queue (batch tested with PRs 51553, 51538, 51663, 51069, 51737) Allow enable pod priority feature gate for GCE and configure priority for kube-proxy **What this PR does / why we need it**: From #23225, this PR adds an option for user to enable pod priority feature gate using GCE startup scripts, and configure pod priority for kube-proxy when enabled. The setup `priorityClassName: system` derives from: https://github.com/kubernetes/kubernetes/blob/ce1485c626d8cddc9dbffe24f91f7d62480148cd/staging/src/k8s.io/api/core/v1/types.go#L2536-L2542 The plan is to configure pod priority for kube-proxy daemonset (#50705) in the same way. **Special notes for your reviewer**: cc @bsalamat @davidopp @thockin **Release note**: ```release-note When using kube-up.sh on GCE, user could set env `ENABLE_POD_PRIORITY=true` to enable pod priority feature gate. ```
What this PR does / why we need it:
From #23225, this PR adds an option for user to run kube-proxy as a DaemonSet instead of static pods using GCE startup scripts. By default, kube-proxy will run as static pods.
This is the first step for moving kube-proxy into a DaemonSet in GCE, remaining tasks will be tracked on #23225.
Special notes for your reviewer:
The last commit are purely for testing out kube-proxy as daemonset via CIs.
cc @kubernetes/sig-network-misc @kubernetes/sig-cluster-lifecycle-misc
Release note (Edited):