Node goes away e2e test #3520

erictune · 2015-01-15T16:49:02Z

Write an e2e test that has a pod and a replication controller and multiple nodes. Delete the node that the pod is on, and see that the absence of the pod/node is detected by the replication controller, and that a replacement pod is created.

zmerlynn · 2015-01-15T17:32:36Z

The follow up to this is what happens when the node reboots and tries to
rejoin. :) Node on fire is "easy".

On Thu, Jan 15, 2015 at 8:49 AM, Eric Tune notifications@github.com wrote:

Write an e2e test that has a pod and a replication controller and multiple
nodes. Delete the node that the pod is on, and see that the absence of the
pod/node is detected by the replication controller, and that a replacement
pod is created.

—
Reply to this email directly or view it on GitHub
#3520.

erictune · 2015-03-24T15:04:51Z

This could also be accomplished in an integration test.

satnam6502 · 2015-03-24T16:00:07Z

As a variation of my serve_hostnames soak/reliability test I intend to make a "cauldron" version of it which uses a replication controller and which every so often kills or add pods and checks to make sure the expected number of pods are up over a given window. Would that meet the requirements of the issue?

ddysher · 2015-03-30T13:11:12Z

/cc @gmarek

erictune · 2015-03-30T15:37:54Z

@satnam6502 That test would certainly be better than nothing.

I'm of the opinion that focused, standalone e2e tests have a lot of value too, and if it were up to me, I'd wrote one of those before embedding the test into a test with a larger scope (reliability). Basically, I think each test should have a purpose which can be described in like one sentence, without use of conjunctions. But that's just my opinion.

satnam6502 · 2015-03-30T16:07:29Z

Agreed, coherent focused e2e tests are of great value. I can take this one unless someone else is super keen to do it. Back to back meetings today in Seattle but I expect I can have it done on Tuesday/Wednesday if that's not too late for you.

erictune · 2015-03-30T18:18:59Z

not super keen.

satnam6502 · 2015-04-01T01:07:20Z

Un-assiging temporarily while I look at issues with our network e2e test. No worries if someone else wants to pick this up before I can get back to it.

gmarek · 2015-08-21T14:19:29Z

@jszczepkowski - does your restart tests cover this?

jszczepkowski · 2015-08-24T07:48:15Z

This exact test case is covered by Nodes.Resize and Nodes.Network. Closing

erictune added area/test area/availability labels Jan 15, 2015

ddysher mentioned this issue Jan 22, 2015

Sync node status from node controller to master. #3733

Merged

goltermann added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jan 28, 2015

ddysher mentioned this issue Feb 8, 2015

Remove pods from failed node #4241

Merged

roberthbailey added the area/test-infra label Feb 18, 2015

satnam6502 self-assigned this Mar 31, 2015

satnam6502 removed their assignment Apr 1, 2015

jszczepkowski self-assigned this Aug 24, 2015

jszczepkowski closed this as completed Aug 24, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node goes away e2e test #3520

Node goes away e2e test #3520

erictune commented Jan 15, 2015

zmerlynn commented Jan 15, 2015

erictune commented Mar 24, 2015

satnam6502 commented Mar 24, 2015

ddysher commented Mar 30, 2015

erictune commented Mar 30, 2015

satnam6502 commented Mar 30, 2015

erictune commented Mar 30, 2015

satnam6502 commented Apr 1, 2015

gmarek commented Aug 21, 2015

jszczepkowski commented Aug 24, 2015

Node goes away e2e test #3520

Node goes away e2e test #3520

Comments

erictune commented Jan 15, 2015

zmerlynn commented Jan 15, 2015

erictune commented Mar 24, 2015

satnam6502 commented Mar 24, 2015

ddysher commented Mar 30, 2015

erictune commented Mar 30, 2015

satnam6502 commented Mar 30, 2015

erictune commented Mar 30, 2015

satnam6502 commented Apr 1, 2015

gmarek commented Aug 21, 2015

jszczepkowski commented Aug 24, 2015