-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete all firewall rules (and optionally network) on GCE/GKE cluster teardown #34577
Conversation
@k8s-bot test this |
Jenkins unit/integration failed for commit ef309fc118d82bf823af1385d67da4be418442d7. Full PR test history. The magic incantation to run this job again is |
I think this is ready for review. Anyone in @kubernetes/test-infra-maintainers want to take a look? |
a9f3225
to
f1fa21c
Compare
@@ -1373,7 +1385,7 @@ function kube-down() { | |||
function get-replica-name() { | |||
echo $(gcloud compute instances list \ | |||
--project "${PROJECT}" \ | |||
--zone "${ZONE}" \ | |||
--zones "${ZONE}" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume the flag is renamed? I saw other scripts still using --zone and might need to be renamed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gcloud
has been pitching a warning for a while now, yeah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1373,7 +1385,7 @@ function kube-down() { | |||
function get-replica-name() { | |||
echo $(gcloud compute instances list \ | |||
--project "${PROJECT}" \ | |||
--zone "${ZONE}" \ | |||
--zones "${ZONE}" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gcloud
has been pitching a warning for a while now, yeah.
|
||
# It's unfortunate that the $FIREWALL_SSH rule and network are created in | ||
# kube-up, but we can only really delete them in test-teardown. So much for | ||
# symmetry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this statement is true. Actually, I don't understand why the gke
provider structures it this way, and I'm scraping my memory for this at all. It's really awkward looking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one is at least partially your fault. 😀
Basically, I guess firewall deletion happens faster if the cluster has already been turned down, so we do that first. I'd prefer to do the network deletion in kube-down
(since it's created in kube-up
), but we can'd do that unless we delete the firewalls first, and some of those firewalls are created by test-setup
, and so should logically only be deleted in test-teardown
.
I don't really understand why we have both test-setup
/kube-up
and test-teardown
/kube-down
everywhere. But addressing that is possibly a larger task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just parallelized them there. We actually don't want to delete the firewall rules until after the container clusters delete
, but I can't tell why we don't just delete them in kube-down
after the call to container clusters delete
, except that they're test-only. I suppose this is mostly irrelevant, though, because the GKE provider is really a synthetic provider for test.
@@ -35,6 +35,7 @@ REGISTER_MASTER_KUBELET=${REGISTER_MASTER:-true} | |||
PREEMPTIBLE_NODE=${PREEMPTIBLE_NODE:-false} | |||
PREEMPTIBLE_MASTER=${PREEMPTIBLE_MASTER:-false} | |||
KUBE_DELETE_NODES=${KUBE_DELETE_NODES:-true} | |||
KUBE_DELETE_NETWORK=${KUBE_DELETE_NETWORK:-true} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right default for config-default.sh
(vs config-test.sh
?) (Same question for GKE provider.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could leave these as false
for the default case. I'm not sure if there was a strong reason for leaking the firewalls/networks by default besides laziness (@thockin's word, not mine. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for cleaning up after ourselves by default unless there is a good reason to do otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of a good reason to intentionally leak, except MAYBE that we don't know everything we leaked, and we don't want to nuke things accidentally...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switched to false in config-default.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternately this could be false if $NETWORK
is "default", true otherwise, but I don't know how much logic we want to embed here.
if [[ -n $(gcloud compute networks --project "${PROJECT}" describe "${NETWORK}" --format='value(name)' 2>/dev/null || true) ]]; then | ||
if ! gcloud compute networks delete --project "${PROJECT}" --quiet "${NETWORK}"; then | ||
echo "Failed to delete network '${NETWORK}'. Listing firewall-rules:" | ||
gcloud compute firewall-rules --project "${PROJECT}" list --filter="network=${NETWORK}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
firewalls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the command is gcloud compute firewall-rules
, so I'm not sure what your comment means.
|
||
function delete-network() { | ||
if [[ -n $(gcloud compute networks --project "${PROJECT}" describe "${NETWORK}" --format='value(name)' 2>/dev/null || true) ]]; then | ||
if ! gcloud compute networks delete --project "${PROJECT}" --quiet "${NETWORK}"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, this is a lot more heavy-handed than I expected. I do NOT think this should be teh default for non-test, because the default network is "default", and many people launch into an existing project. This PR will totally nuke my project, if I understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh, misreading, nevermind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I was right - this will delete the default net. Taht would be bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd probably fail to delete the network, assuming there are other things using the network (e.g. firewall rules with a different prefix).
in any case, you're right that this is a bit scary, so I've updated this so we only delete the network by default for test. (I still have this deleting the cluster firewall rules by default)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't, typically, because it takes all the dependencies being gone before the default
network can be deleted using gcloud
(unlike the webconsole, gcloud is just trying to delete this single object). That's not usually going to be the case because default
has other fw rules in it to start (the default-allow
rules, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, as I commented above, I don't think it should be the default for config-default.sh
, simply because it'll usually error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I was commenting on a non-refreshed copy, apparently, so this thread looks all jacked up. LGTM.
nevermind, didn't read the echo On Wed, Oct 12, 2016 at 3:47 PM, Jeff Grafton notifications@github.com
|
I will not have time to circle back to this - thanks for addressing my On Wed, Oct 12, 2016 at 4:00 PM, Tim Hockin thockin@google.com wrote:
|
@k8s-bot test this please |
Jenkins GCI GCE e2e failed for commit ee6e924. Full PR test history. The magic incantation to run this job again is |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
…77-origin-release-1.4 Automated cherry pick of #34577
@fabioy Any plan to cherrypick this to 1.3 too? |
…ck-of-#34577-origin-release-1.4 Automated cherry pick of kubernetes#34577
Not entirely ready for review yet; I want to see what Jenkins thinks of this.
This change is