Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checking of events after all pods started to verify no failures in #6638

Merged
merged 3 commits into from
Apr 22, 2015

Conversation

rrati
Copy link

@rrati rrati commented Apr 9, 2015

density test #6637

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project, in which case you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

}.AsSelector(),
)
expectNoError(err)
last = current
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really inefficient way of getting all the events you want. I recommend waiting for #6546 to land and using the NewInformer() thing it adds to collect events you care about.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How close is #6546 to landing? Can we accept as is and log another enh to convert to NewInformer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#6546 should hopefully merge today. I am concerned that this method of getting events is going to be hard enough on apiserver that it will actually cause a performance problem. I'm also not sure about this stopping condition-- is it a guarantee that no events are generated in steady state? That may currently be the case but I don't think it's guaranteed...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testing the only problems came about when a pod was continually restarted as that causes a never ending spew of events. Even cases where there was an error the test went okay. I dislike the notion of the test hanging because of a pod continually restarting, but atm I don't see a way to avoid it. Either the test provides a false positive by shutting off the event collection too soon and potentially missing an issue, or it hangs. The later seemed better because it would force an investigation, but neither are desirable imo.

I could put in an insanely large timeout, like 20 minutes or something. Thoughts @lavalamp ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for delay-- #6546 has merged!

I recommend using that to collect events, run your assertion on every event you see, and stop the controller after the replication controller has shut down. I don't know that it's worth it to wait for every last event-- there's not really a firm bound on how long that could take.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lavalamp I'm not sure I see how using 6546 will help ensure that we inspect all the events that are logged. I originally only cared about the start events, but there's no reason not to care about the shutdown events as well. If I move the inspection of the events until after the rc is deleted I should be able to wait for all the events to be logged. As long as the system is properly ensuring pods are shut down then the event stream should stop within a reasonable time after all the pods are stopped. This should bound the test and prevent it from running forever.

I'm not sure how to do that with 6546 though. I don't see any way to use that change and know that events have all been generated and it's ok to stop the controller.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use exactly the same logic-- stop if you haven't gotten an event in 10 seconds. It's just that the current code does a lot of lists which are unfortunately quite heavyweight.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the concern is the efficiency of the test, then I would rather get the change in now and enhance it later. Ensuring the condition is tested is more important than ensuring it is done in the most efficient way imo. I couldn't say that any other e2e test is performing its functions in the most efficient way possible.

My concern has been bounding the event collection loop, and I've done that in a change I can push.

@lavalamp lavalamp self-assigned this Apr 9, 2015
@rrati rrati force-pushed the events-in-density-6637 branch from dfdd36e to 020ba6a Compare April 21, 2015 17:57
@timothysc
Copy link
Member

cc - @wojtek-t @fgrzadkowski

@rrati
Copy link
Author

rrati commented Apr 22, 2015

@lavalamp Any issues with the latest changes?

timeout := 10 * time.Minute
for start := time.Now(); last < current && time.Since(start) < timeout; time.Sleep(10 * time.Second) {
last = current
current = len(events)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a pretty harmless race, but you have the potential for concurrent reads and writes to events. Please send a follow-up PR that adds a lock.

@lavalamp
Copy link
Member

LGTM, but please send a follow-up.

lavalamp added a commit that referenced this pull request Apr 22, 2015
Add checking of events after all pods started to verify no failures in
@lavalamp lavalamp merged commit 2b241e7 into kubernetes:master Apr 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants