Density should cleanup better #10080

bprashanth · 2015-06-19T02:34:53Z

In cases where one wants to run multiple iterations of density back to back, we really need to make sure a pervious iteration's load doesn't leak into this one. 2 main culprits are:

Deleting the namespace will return in a moment, when really it takes like ~15m to delete all events (~20,000 events for 3000 pods). These deletes actually have a bad effect on the next run.
We should make sure the pods are actually gone on the kubelet before staring a new run by using the proxy to poll /pods or something. Otherwise we're effectively putting > 30 pods on a node and the kubelet is known to be slow during pod deletes.

This is obviously not a problem if we tear down the entire cluser between iterations.
@gmarek

gmarek · 2015-06-19T06:43:39Z

IMO it's not a problem. In Jenkins we do turn down clusters between runs, and when running manually it's all in hands of the person running.

In my case, when running tests I was always waiting for proper namespace deletion. Delete indeed returns straight away, but get namespace still returns namespace in question as "terminating". It's slightly faster than turning down the cluster.

In addition if you want to save the time you don't want to delete pods used to saturate the cluster (those first 30/node), as bringing them up again takes 7-8 minutes. Instead you just play with the test commenting out pieces you don't want to use, rebuild the test, which takes seconds and run it again.
Deleting additional 100 pods together with associated events is pretty quick.

gmarek · 2015-06-19T06:43:56Z

I believe we can close this issue.

bprashanth · 2015-06-19T06:49:43Z

Instead you just play with the test commenting out pieces you don't want to use, rebuild the test, which takes seconds and run it again.
Deleting additional 100 pods together with associated events is pretty quick.

Sure, there's doing something like that, and theres leaving a clean environment for the next test by default so your tests can run in any order. We have iterations there as a feature, and I think we still run density as part of e2e, so anything that runs after it feels the after effects. It's just too easy to shoot yourself in the foot if you don't do proper cleanup in a test, assuming it's always giong to run alone in a fresh cluster doesn't sound like a good idea even if it happens to work today.

gmarek · 2015-06-19T07:07:21Z

IIUC this (i.e. test isolation) is deeper issue, not only for the Density, but also Load test, and any other e2e that creates non-trivial number of resources.

wojtek-t · 2015-06-19T07:33:26Z

We no longer run those tests as part of regular gce e2e test - see #9858

bprashanth · 2015-06-19T14:24:01Z

See #3954 (comment)
I don't mind making strong-cleanup a flag that the comment out crowd can tweak at will, but a test leaking state feels broken.

IIUC this (i.e. test isolation) is deeper issue, not only for the Density, but also Load test, and any other e2e that creates non-trivial number of resources.

We don't really have many of those in wide use today. Most e2es start 1,2 pods and don't care about latency, so the suite (on the whole) prioritizes throughput. Load tests like density are a special case because even their cleanup can disrupt the next run. But I agree we have a more general test cleanup problem :(

bprashanth · 2015-06-22T20:59:13Z

The events deletion at least is handled by #10175.

gmarek · 2015-06-23T07:36:23Z

Yup. I believe we can close this bug.

bprashanth · 2015-06-23T14:39:10Z

Hmm, there's still the problem of cleaning up containers on the kubelet (delete just removes keys from etcd currently)

timothysc · 2015-06-25T14:10:03Z

/cc @rrati

timothysc · 2015-06-26T18:27:35Z

@bprashanth details?

bprashanth · 2015-06-26T18:29:19Z

kubectl delete pods doesn't wait till the kubelet has deleted containers on the node

timothysc · 2015-06-26T18:53:11Z

Ohh yeah, that's been that way forever, it's non-blocking. It was my understanding that it was unlikely to change.

gmarek · 2015-08-31T11:06:49Z

@quinton-hoole @wojtek-t is this still a thing?

wojtek-t · 2015-08-31T11:51:59Z

Currently it's hacked - to provide a clear and fast solution we need to be able to quickly remove the whole namespace and to do it we need to be able to remove a collection of objects in a single step.
But I think we can close it now as a duplicate of #10380

bprashanth added area/test team/master labels Jun 19, 2015

gmarek closed this as completed Jun 23, 2015

bprashanth reopened this Jun 23, 2015

mbforbes added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jun 24, 2015

erictune added team/control-plane and removed team/master labels Aug 19, 2015

gmarek closed this as completed Aug 31, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Density should cleanup better #10080

Density should cleanup better #10080

bprashanth commented Jun 19, 2015

gmarek commented Jun 19, 2015

gmarek commented Jun 19, 2015

bprashanth commented Jun 19, 2015

gmarek commented Jun 19, 2015

wojtek-t commented Jun 19, 2015

bprashanth commented Jun 19, 2015

bprashanth commented Jun 22, 2015

gmarek commented Jun 23, 2015

bprashanth commented Jun 23, 2015

timothysc commented Jun 25, 2015

timothysc commented Jun 26, 2015

bprashanth commented Jun 26, 2015

timothysc commented Jun 26, 2015

gmarek commented Aug 31, 2015

wojtek-t commented Aug 31, 2015

Density should cleanup better #10080

Density should cleanup better #10080

Comments

bprashanth commented Jun 19, 2015

gmarek commented Jun 19, 2015

gmarek commented Jun 19, 2015

bprashanth commented Jun 19, 2015

gmarek commented Jun 19, 2015

wojtek-t commented Jun 19, 2015

bprashanth commented Jun 19, 2015

bprashanth commented Jun 22, 2015

gmarek commented Jun 23, 2015

bprashanth commented Jun 23, 2015

timothysc commented Jun 25, 2015

timothysc commented Jun 26, 2015

bprashanth commented Jun 26, 2015

timothysc commented Jun 26, 2015

gmarek commented Aug 31, 2015

wojtek-t commented Aug 31, 2015