-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix master regex when running multiple clusters #58561
Conversation
I'm running two Kubernetes clusters on GCE. One for production and one for staging. The instance prefix I use for production is `kubernetes` and for staging it's `staging-kubernetes`. This caused a problem when running `kube-up.sh` for production because when it tries to find all instances which match `kubernetes(-...)?` it finds both the production and staging instances. This probably results in multiple problems, but the most noticeable one for me was that I`NITIAL_ETCD_CLUSTER` was incorrect and so etcd wouldn't start up correctly so the api server doesn't start up correctly so nothing else starts up. I tested this manually and it seems to work for me, but I didn't write an automated test.
/ok-to-test |
@@ -1868,7 +1868,7 @@ function get-master-replicas-count() { | |||
# Prints regexp for full master machine name. In a cluster with replicated master, | |||
# VM names may either be MASTER_NAME or MASTER_NAME with a suffix for a replica. | |||
function get-replica-name-regexp() { | |||
echo "${MASTER_NAME}(-...)?" | |||
echo "^${MASTER_NAME}(-...)?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use kubernetes-staging
in your own env? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point :) Had I known I was gonna run into this issue I would have totally done that! I'm just hoping this will help someone else down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this change. What's the motivation of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote a little bit about it in the PR description above, but basically, I'm running a production cluster and staging cluster and I have GCE instances named kubernetes-master-001
, kubernetes-master-002
, etc and more GCE instances named staging-kubernetes-master-001
, staging-kubernetes-master-002
, etc.
When running kube-up.sh
to create another production master, the INITIAL_ETCD_CLUSTER
contains the production and staging instances when it should only be production instances. As a result, etcd fails to startup with an errors like member count unequal
-> the api server fails to startup -> nothing else works and the newly created master is broken.
This change makes sure that the INITIAL_ETCD_CLUSTER
only contains the production master instances.
/assign @jszczepkowski |
@wojtek-t any thoughts on this? I think it should be pretty harmless and could help some people down the line in similar situations as mine. |
OK - that makes sense. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jesseshieh, wojtek-t Associated issue requirement bypassed by: wojtek-t The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
3 similar comments
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
/test all [submit-queue is verifying that this PR is safe to merge] |
@jesseshieh: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
What this PR does / why we need it:
I'm running two Kubernetes clusters on GCE. One for production and one for staging. The instance prefix I use for production is
kubernetes
and for staging it'sstaging-kubernetes
. This caused a problem when runningkube-up.sh
for production because when it tries to find all instances which matchkubernetes(-...)?
it finds both the production and staging instances. This probably results in multiple problems, but the most noticeable one for me was thatINITIAL_ETCD_CLUSTER
was incorrect and so etcd wouldn't start up correctly so the api server doesn't start up correctly so nothing else starts up. I tested this manually and it seems to work for me, but I didn't write an automated test.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Release note: