Add checking of events after all pods started to verify no failures in #6638

rrati · 2015-04-09T17:53:21Z

density test #6637

googlebot · 2015-04-09T17:53:22Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project, in which case you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please let us know the company's name.

lavalamp · 2015-04-09T18:05:26Z

test/e2e/density.go

+					}.AsSelector(),
+				)
+				expectNoError(err)
+				last = current


This is a really inefficient way of getting all the events you want. I recommend waiting for #6546 to land and using the NewInformer() thing it adds to collect events you care about.

How close is #6546 to landing? Can we accept as is and log another enh to convert to NewInformer?

#6546 should hopefully merge today. I am concerned that this method of getting events is going to be hard enough on apiserver that it will actually cause a performance problem. I'm also not sure about this stopping condition-- is it a guarantee that no events are generated in steady state? That may currently be the case but I don't think it's guaranteed...

In my testing the only problems came about when a pod was continually restarted as that causes a never ending spew of events. Even cases where there was an error the test went okay. I dislike the notion of the test hanging because of a pod continually restarting, but atm I don't see a way to avoid it. Either the test provides a false positive by shutting off the event collection too soon and potentially missing an issue, or it hangs. The later seemed better because it would force an investigation, but neither are desirable imo.

I could put in an insanely large timeout, like 20 minutes or something. Thoughts @lavalamp ?

Sorry for delay-- #6546 has merged!

I recommend using that to collect events, run your assertion on every event you see, and stop the controller after the replication controller has shut down. I don't know that it's worth it to wait for every last event-- there's not really a firm bound on how long that could take.

@lavalamp I'm not sure I see how using 6546 will help ensure that we inspect all the events that are logged. I originally only cared about the start events, but there's no reason not to care about the shutdown events as well. If I move the inspection of the events until after the rc is deleted I should be able to wait for all the events to be logged. As long as the system is properly ensuring pods are shut down then the event stream should stop within a reasonable time after all the pods are stopped. This should bound the test and prevent it from running forever.

I'm not sure how to do that with 6546 though. I don't see any way to use that change and know that events have all been generated and it's ok to stop the controller.

You could use exactly the same logic-- stop if you haven't gotten an event in 10 seconds. It's just that the current code does a lot of lists which are unfortunately quite heavyweight.

If the concern is the efficiency of the test, then I would rather get the change in now and enhance it later. Ensuring the condition is tested is more important than ensuring it is done in the most efficient way imo. I couldn't say that any other e2e test is performing its functions in the most efficient way possible.

My concern has been bounding the event collection loop, and I've done that in a change I can push.

density test kubernetes#6637

if not all are logged 10 minutes after all pods are started kubernetes#6637

…etes#6637

timothysc · 2015-04-22T15:44:27Z

cc - @wojtek-t @fgrzadkowski

rrati · 2015-04-22T18:17:10Z

@lavalamp Any issues with the latest changes?

lavalamp · 2015-04-22T21:28:30Z

test/e2e/density.go

+			timeout := 10 * time.Minute
+			for start := time.Now(); last < current && time.Since(start) < timeout; time.Sleep(10 * time.Second) {
+				last = current
+				current = len(events)


I think it's a pretty harmless race, but you have the potential for concurrent reads and writes to events. Please send a follow-up PR that adds a lock.

lavalamp · 2015-04-22T21:30:17Z

LGTM, but please send a follow-up.

Add checking of events after all pods started to verify no failures in

googlebot added the cla: no label Apr 9, 2015

lavalamp reviewed Apr 9, 2015
View reviewed changes

lavalamp self-assigned this Apr 9, 2015

Robert Rati added 3 commits April 21, 2015 13:57

Add checking of events after all pods started to verify no failures in

93d1040

density test kubernetes#6637

Bounded the loop waiting for all events to be logged and print a warning

23c5b77

if not all are logged 10 minutes after all pods are started kubernetes#6637

Converted from continually listing events to using an Informer kubern…

020ba6a

…etes#6637

rrati force-pushed the events-in-density-6637 branch from dfdd36e to 020ba6a Compare April 21, 2015 17:57

lavalamp reviewed Apr 22, 2015
View reviewed changes

lavalamp added a commit that referenced this pull request Apr 22, 2015

Merge pull request #6638 from rrati/events-in-density-6637

2b241e7

Add checking of events after all pods started to verify no failures in

lavalamp merged commit 2b241e7 into kubernetes:master Apr 22, 2015

wojtek-t mentioned this pull request Apr 23, 2015

Performance tests on Jenkins are failing after #6638 #7216

Closed

lavalamp mentioned this pull request Apr 23, 2015

Add event checking to density tests #6637

Closed

rrati unassigned lavalamp Aug 12, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add checking of events after all pods started to verify no failures in #6638

Add checking of events after all pods started to verify no failures in #6638

rrati commented Apr 9, 2015

googlebot commented Apr 9, 2015

lavalamp Apr 9, 2015

rrati Apr 9, 2015

lavalamp Apr 9, 2015

rrati Apr 10, 2015

lavalamp Apr 14, 2015

rrati Apr 20, 2015

lavalamp Apr 21, 2015

rrati Apr 21, 2015

timothysc commented Apr 22, 2015

rrati commented Apr 22, 2015

lavalamp Apr 22, 2015

lavalamp commented Apr 22, 2015

Add checking of events after all pods started to verify no failures in #6638

Add checking of events after all pods started to verify no failures in #6638

Conversation

rrati commented Apr 9, 2015

googlebot commented Apr 9, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timothysc commented Apr 22, 2015

rrati commented Apr 22, 2015

Choose a reason for hiding this comment

lavalamp commented Apr 22, 2015