Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete all firewall rules (and optionally network) on GCE/GKE cluster teardown #34577

Merged
merged 2 commits into from
Oct 13, 2016

Conversation

ixdy
Copy link
Member

@ixdy ixdy commented Oct 11, 2016

Not entirely ready for review yet; I want to see what Jenkins thinks of this.


This change is Reviewable

@ixdy ixdy added the area/test label Oct 11, 2016
@ixdy ixdy self-assigned this Oct 11, 2016
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Oct 11, 2016
@ixdy
Copy link
Member Author

ixdy commented Oct 12, 2016

@k8s-bot test this

@k8s-ci-robot
Copy link
Contributor

Jenkins unit/integration failed for commit ef309fc118d82bf823af1385d67da4be418442d7. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@ixdy ixdy removed their assignment Oct 12, 2016
@ixdy
Copy link
Member Author

ixdy commented Oct 12, 2016

I think this is ready for review. Anyone in @kubernetes/test-infra-maintainers want to take a look?

@ixdy ixdy force-pushed the cleanup-network branch 2 times, most recently from a9f3225 to f1fa21c Compare October 12, 2016 21:56
@@ -1373,7 +1385,7 @@ function kube-down() {
function get-replica-name() {
echo $(gcloud compute instances list \
--project "${PROJECT}" \
--zone "${ZONE}" \
--zones "${ZONE}" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume the flag is renamed? I saw other scripts still using --zone and might need to be renamed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcloud has been pitching a warning for a while now, yeah.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, most gcloud compute list calls use --zones now instead of --zone. #29851 fixed most of these, but missed this one (which was added by #29995).

I left this change as a separate commit to make that clear.

@ixdy ixdy added release-note-none Denotes a PR that doesn't merit a release note. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed release-note-none Denotes a PR that doesn't merit a release note. labels Oct 12, 2016
@@ -1373,7 +1385,7 @@ function kube-down() {
function get-replica-name() {
echo $(gcloud compute instances list \
--project "${PROJECT}" \
--zone "${ZONE}" \
--zones "${ZONE}" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcloud has been pitching a warning for a while now, yeah.


# It's unfortunate that the $FIREWALL_SSH rule and network are created in
# kube-up, but we can only really delete them in test-teardown. So much for
# symmetry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this statement is true. Actually, I don't understand why the gke provider structures it this way, and I'm scraping my memory for this at all. It's really awkward looking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one is at least partially your fault. 😀

#26664

Basically, I guess firewall deletion happens faster if the cluster has already been turned down, so we do that first. I'd prefer to do the network deletion in kube-down (since it's created in kube-up), but we can'd do that unless we delete the firewalls first, and some of those firewalls are created by test-setup, and so should logically only be deleted in test-teardown.

I don't really understand why we have both test-setup/kube-up and test-teardown/kube-down everywhere. But addressing that is possibly a larger task.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just parallelized them there. We actually don't want to delete the firewall rules until after the container clusters delete, but I can't tell why we don't just delete them in kube-down after the call to container clusters delete, except that they're test-only. I suppose this is mostly irrelevant, though, because the GKE provider is really a synthetic provider for test.

@@ -35,6 +35,7 @@ REGISTER_MASTER_KUBELET=${REGISTER_MASTER:-true}
PREEMPTIBLE_NODE=${PREEMPTIBLE_NODE:-false}
PREEMPTIBLE_MASTER=${PREEMPTIBLE_MASTER:-false}
KUBE_DELETE_NODES=${KUBE_DELETE_NODES:-true}
KUBE_DELETE_NETWORK=${KUBE_DELETE_NETWORK:-true}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right default for config-default.sh (vs config-test.sh?) (Same question for GKE provider.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could leave these as false for the default case. I'm not sure if there was a strong reason for leaking the firewalls/networks by default besides laziness (@thockin's word, not mine. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for cleaning up after ourselves by default unless there is a good reason to do otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a good reason to intentionally leak, except MAYBE that we don't know everything we leaked, and we don't want to nuke things accidentally...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched to false in config-default.sh

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternately this could be false if $NETWORK is "default", true otherwise, but I don't know how much logic we want to embed here.

if [[ -n $(gcloud compute networks --project "${PROJECT}" describe "${NETWORK}" --format='value(name)' 2>/dev/null || true) ]]; then
if ! gcloud compute networks delete --project "${PROJECT}" --quiet "${NETWORK}"; then
echo "Failed to delete network '${NETWORK}'. Listing firewall-rules:"
gcloud compute firewall-rules --project "${PROJECT}" list --filter="network=${NETWORK}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

firewalls?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the command is gcloud compute firewall-rules, so I'm not sure what your comment means.


function delete-network() {
if [[ -n $(gcloud compute networks --project "${PROJECT}" describe "${NETWORK}" --format='value(name)' 2>/dev/null || true) ]]; then
if ! gcloud compute networks delete --project "${PROJECT}" --quiet "${NETWORK}"; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this is a lot more heavy-handed than I expected. I do NOT think this should be teh default for non-test, because the default network is "default", and many people launch into an existing project. This PR will totally nuke my project, if I understand it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh, misreading, nevermind

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I was right - this will delete the default net. Taht would be bad

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd probably fail to delete the network, assuming there are other things using the network (e.g. firewall rules with a different prefix).

in any case, you're right that this is a bit scary, so I've updated this so we only delete the network by default for test. (I still have this deleting the cluster firewall rules by default)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't, typically, because it takes all the dependencies being gone before the default network can be deleted using gcloud (unlike the webconsole, gcloud is just trying to delete this single object). That's not usually going to be the case because default has other fw rules in it to start (the default-allow rules, etc.).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, as I commented above, I don't think it should be the default for config-default.sh, simply because it'll usually error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I was commenting on a non-refreshed copy, apparently, so this thread looks all jacked up. LGTM.

@ixdy ixdy changed the title Delete all firewall rules and network on GCE/GKE cluster teardown Delete all firewall rules (and optionally network) on GCE/GKE cluster teardown Oct 12, 2016
@spxtr
Copy link
Contributor

spxtr commented Oct 12, 2016

@thockin
Copy link
Member

thockin commented Oct 12, 2016

nevermind, didn't read the echo

On Wed, Oct 12, 2016 at 3:47 PM, Jeff Grafton notifications@github.com
wrote:

@ixdy commented on this pull request.

In cluster/gce/util.sh
#34577:

+function delete-firewall-rules() {

  • for fw in $@; do
  • if [[ -n $(gcloud compute firewall-rules --project "${PROJECT}" describe "${fw}" --format='value(name)' 2>/dev/null || true) ]]; then
  •  gcloud compute firewall-rules delete --project "${PROJECT}" --quiet "${fw}" &
    
  • fi
  • done
  • kube::util::wait-for-jobs || {
  • echo -e "${color_red}Failed to delete firewall rules.${color_norm}" >&2
  • }
    +}

+function delete-network() {

  • if [[ -n $(gcloud compute networks --project "${PROJECT}" describe "${NETWORK}" --format='value(name)' 2>/dev/null || true) ]]; then
  • if ! gcloud compute networks delete --project "${PROJECT}" --quiet "${NETWORK}"; then
  •  echo "Failed to delete network '${NETWORK}'. Listing firewall-rules:"
    
  •  gcloud compute firewall-rules --project "${PROJECT}" list --filter="network=${NETWORK}"
    

the command is gcloud compute firewall-rules, so I'm not sure what your
comment means.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#34577, or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVDrFMfOO4iJCWxkq2cZytEOdhJY8ks5qzWN1gaJpZM4KUOgw
.

@thockin
Copy link
Member

thockin commented Oct 12, 2016

I will not have time to circle back to this - thanks for addressing my
concerns :)

On Wed, Oct 12, 2016 at 4:00 PM, Tim Hockin thockin@google.com wrote:

nevermind, didn't read the echo

On Wed, Oct 12, 2016 at 3:47 PM, Jeff Grafton notifications@github.com
wrote:

@ixdy commented on this pull request.

In cluster/gce/util.sh
#34577:

+function delete-firewall-rules() {

  • for fw in $@; do
  • if [[ -n $(gcloud compute firewall-rules --project "${PROJECT}" describe "${fw}" --format='value(name)' 2>/dev/null || true) ]]; then
  •  gcloud compute firewall-rules delete --project "${PROJECT}" --quiet "${fw}" &
    
  • fi
  • done
  • kube::util::wait-for-jobs || {
  • echo -e "${color_red}Failed to delete firewall rules.${color_norm}" >&2
  • }
    +}

+function delete-network() {

  • if [[ -n $(gcloud compute networks --project "${PROJECT}" describe "${NETWORK}" --format='value(name)' 2>/dev/null || true) ]]; then
  • if ! gcloud compute networks delete --project "${PROJECT}" --quiet "${NETWORK}"; then
  •  echo "Failed to delete network '${NETWORK}'. Listing firewall-rules:"
    
  •  gcloud compute firewall-rules --project "${PROJECT}" list --filter="network=${NETWORK}"
    

the command is gcloud compute firewall-rules, so I'm not sure what your
comment means.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#34577, or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVDrFMfOO4iJCWxkq2cZytEOdhJY8ks5qzWN1gaJpZM4KUOgw
.

@zmerlynn zmerlynn added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 12, 2016
@ixdy
Copy link
Member Author

ixdy commented Oct 12, 2016

@k8s-bot test this please

@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GCE e2e failed for commit ee6e924. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@wonderfly
Copy link
Contributor

@fabioy Any plan to cherrypick this to 1.3 too?

shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this pull request Dec 1, 2016
…ck-of-#34577-origin-release-1.4

Automated cherry pick of kubernetes#34577
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants