Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Density should cleanup better #10080

Closed
bprashanth opened this issue Jun 19, 2015 · 15 comments
Closed

Density should cleanup better #10080

bprashanth opened this issue Jun 19, 2015 · 15 comments
Labels
area/test priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@bprashanth
Copy link
Contributor

In cases where one wants to run multiple iterations of density back to back, we really need to make sure a pervious iteration's load doesn't leak into this one. 2 main culprits are:

  1. Deleting the namespace will return in a moment, when really it takes like ~15m to delete all events (~20,000 events for 3000 pods). These deletes actually have a bad effect on the next run.
  2. We should make sure the pods are actually gone on the kubelet before staring a new run by using the proxy to poll /pods or something. Otherwise we're effectively putting > 30 pods on a node and the kubelet is known to be slow during pod deletes.

This is obviously not a problem if we tear down the entire cluser between iterations.
@gmarek

@gmarek
Copy link
Contributor

gmarek commented Jun 19, 2015

IMO it's not a problem. In Jenkins we do turn down clusters between runs, and when running manually it's all in hands of the person running.

In my case, when running tests I was always waiting for proper namespace deletion. Delete indeed returns straight away, but get namespace still returns namespace in question as "terminating". It's slightly faster than turning down the cluster.

In addition if you want to save the time you don't want to delete pods used to saturate the cluster (those first 30/node), as bringing them up again takes 7-8 minutes. Instead you just play with the test commenting out pieces you don't want to use, rebuild the test, which takes seconds and run it again.
Deleting additional 100 pods together with associated events is pretty quick.

@gmarek
Copy link
Contributor

gmarek commented Jun 19, 2015

I believe we can close this issue.

@bprashanth
Copy link
Contributor Author

Instead you just play with the test commenting out pieces you don't want to use, rebuild the test, which takes seconds and run it again.
Deleting additional 100 pods together with associated events is pretty quick.

Sure, there's doing something like that, and theres leaving a clean environment for the next test by default so your tests can run in any order. We have iterations there as a feature, and I think we still run density as part of e2e, so anything that runs after it feels the after effects. It's just too easy to shoot yourself in the foot if you don't do proper cleanup in a test, assuming it's always giong to run alone in a fresh cluster doesn't sound like a good idea even if it happens to work today.

@gmarek
Copy link
Contributor

gmarek commented Jun 19, 2015

IIUC this (i.e. test isolation) is deeper issue, not only for the Density, but also Load test, and any other e2e that creates non-trivial number of resources.

@wojtek-t
Copy link
Member

We no longer run those tests as part of regular gce e2e test - see #9858

@bprashanth
Copy link
Contributor Author

See #3954 (comment)
I don't mind making strong-cleanup a flag that the comment out crowd can tweak at will, but a test leaking state feels broken.

IIUC this (i.e. test isolation) is deeper issue, not only for the Density, but also Load test, and any other e2e that creates non-trivial number of resources.

We don't really have many of those in wide use today. Most e2es start 1,2 pods and don't care about latency, so the suite (on the whole) prioritizes throughput. Load tests like density are a special case because even their cleanup can disrupt the next run. But I agree we have a more general test cleanup problem :(

@bprashanth
Copy link
Contributor Author

The events deletion at least is handled by #10175.

@gmarek
Copy link
Contributor

gmarek commented Jun 23, 2015

Yup. I believe we can close this bug.

@gmarek gmarek closed this as completed Jun 23, 2015
@bprashanth
Copy link
Contributor Author

Hmm, there's still the problem of cleaning up containers on the kubelet (delete just removes keys from etcd currently)

@bprashanth bprashanth reopened this Jun 23, 2015
@mbforbes mbforbes added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jun 24, 2015
@timothysc
Copy link
Member

/cc @rrati

@timothysc
Copy link
Member

@bprashanth details?

@bprashanth
Copy link
Contributor Author

kubectl delete pods doesn't wait till the kubelet has deleted containers on the node

@timothysc
Copy link
Member

Ohh yeah, that's been that way forever, it's non-blocking. It was my understanding that it was unlikely to change.

@gmarek
Copy link
Contributor

gmarek commented Aug 31, 2015

@quinton-hoole @wojtek-t is this still a thing?

@wojtek-t
Copy link
Member

Currently it's hacked - to provide a clear and fast solution we need to be able to quickly remove the whole namespace and to do it we need to be able to remove a collection of objects in a single step.
But I think we can close it now as a duplicate of #10380

@gmarek gmarek closed this as completed Aug 31, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

7 participants
@timothysc @mbforbes @gmarek @erictune @wojtek-t @bprashanth and others