Add flaky test mitigation #261

lavalamp · 2016-06-22T00:18:03Z

Implements #260

tedsuo · 2016-06-28T23:58:50Z

I see no integration tests for this feature, and I suspect that these new flags are not actually propagated. Can you add a test to integration/flags_test to prove that this works?

lavalamp · 2016-06-29T00:58:42Z

OK, I'll add a test asap.

lavalamp · 2016-06-29T03:41:00Z

Integration tests added. @tedsuo was there some particular thing you were concerned about? It passed as-is.

tedsuo · 2016-06-29T05:31:34Z

At a glance, it looked like these flags would have to propagate through BuildFlagArgs in order to work, at least in some cases: https://github.com/lavalamp/ginkgo/blob/cd831bb9a6e6a46115ca953b13b4a5b61d2c2ca1/config/config.go#L100
https://github.com/onsi/ginkgo/blob/master/ginkgo/testrunner/test_runner.go#L226

Thanks for the integration tests. I'll manually review tomorrow.

lavalamp · 2016-06-29T05:35:27Z

Ah, I didn't see that section. I will add them there tomorrow, that looks
important.

On Tue, Jun 28, 2016 at 10:31 PM, Ted Young notifications@github.com
wrote:

At a glance, it looked like these flags would have to propagate through
BuildFlagArgs in order to work, at least in some cases:
https://github.com/lavalamp/ginkgo/blob/cd831bb9a6e6a46115ca953b13b4a5b61d2c2ca1/config/config.go#L100

https://github.com/onsi/ginkgo/blob/master/ginkgo/testrunner/test_runner.go#L226

Thanks for the integration tests. I'll manually review tomorrow.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#261 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAnglipmdJ4LU6ibjWp3vDskXOcAzEUNks5qQgM5gaJpZM4I7TBP
.

lavalamp · 2016-06-29T20:47:05Z

@tedsuo I've added to that function and filed #263.

lavalamp · 2016-06-30T18:59:32Z

@tedsuo sorry to be pushy-- any more comments?

onsi · 2016-07-01T05:28:00Z

hi @lavalamp - sorry for the delay.

I'll commit to merging this in, but some comments:

let's remove FlakePassesRequired - I'm not completely convinced of the usecase and would prefer to see it emerge from real-world usage
I think we're missing some messaging to convey "this test has failed and is being retried". Doing this well may entail changing the spec summary object or possibly the reporter interface itself. As it stands I'm concerned folks will simply see the test suite pass but some tests look like they've failed. I won't gate the PR on this but if you could spend a few minutes taking a look at the reporter/stenographer and consider whether there's a simple way to convey this context I'd appreciate it.

lavalamp · 2016-07-02T06:00:59Z

remove FlakePassesRequired

Done-- I was thinking I might want to require more than a 50% pass rate but we can cross that bridge when it becomes an actual problem.

changing the spec summary object or possibly the reporter interface

I've run out of steam tonight but I'll try to take a look at this soon.

lavalamp · 2016-07-02T06:18:24Z

I guess the obvious thing to do is add a "NumberOfFlakySpecs" to the suite summary and let the reporters handle that as appropriate.

The stenographer interface is harder, since a "flake" is really N failures followed by a pass for the same spec. I'll think about this more tomorrow.

At the very least I can add some documentation warning users about the behavior of this flag (pushed just now).

lavalamp · 2016-07-06T23:51:11Z

I added the flake concept to the suite summary and to the spec object.

Please let me know if you think this is good, or if I should do something like run the test N times and then report the result a single time.

lavalamp · 2016-07-09T05:33:03Z

@onsi anything I can do to move this along?

onsi · 2016-07-11T03:46:32Z

Hey @lavalamp bear with me. We're exploring a couple of different patterns for doing this. I have a commit on a branch:

https://github.com/onsi/ginkgo/tree/runlists

That adds the ability to emit failed tests to a file then run only tests in said file. Details:

ginkgo --writeFailuresFile=path/to/file.json

will write to file.json if and only if there are test failures. it will export a list of failed tests detailing the name of the test, the coordinates of the test, and the failure reason. if nothing fails no file is written however if a file with that name already exists it will not be removed (should it be?)

ginkgo --runFailuresFile=path/to/file.json

will run only the tests listed in file.json. if file.json doesn't exist or is malformed ginkgo panics (so its on you to write the code that says "is this file here? should i rerun the tests?). i think passing both flags in will work so you can rerun until there are no failures.

We're going to try this interface IRL and weigh it against your approach. This requires a bit more work on the test runners side (e.g. a bash script that reruns ginkgo until it exits 0) but it does leave a better trail of breadcrumbs and will be harder for folks to abuse.

lavalamp · 2016-07-11T04:00:00Z

That approach will call before/afterSuite multiple times. Also, putting
logic in bash is unfortunate. Maybe I can look into using the logic in this
change but emitting a second JUnit file for reruns?
On Jul 10, 2016 8:46 PM, "Onsi Fakhouri" notifications@github.com wrote:

Hey @lavalamp https://github.com/lavalamp bear with me. We're exploring
a couple of different patterns for doing this. I have a commit on a branch:

https://github.com/onsi/ginkgo/tree/runlists

That adds the ability to emit failed tests to a file then run only tests
in said file. Details:

ginkgo --writeFailuresFile=path/to/file.json

will write to file.json if and only if there are test failures. it will
export a list of failed tests detailing the name of the test, the
coordinates of the test, and the failure reason. if nothing fails no file
is written however if a file with that name already exists it will not be
removed (should it be?)

ginkgo --runFailuresFile=path/to/file.json

will run only the tests listed in file.json. if file.json doesn't exist or
is malformed ginkgo panics (so its on you to write the code that says "is
this file here? should i rerun the tests?). i think passing both flags in
will work so you can rerun until there are no failures.

We're going to try this interface IRL and weigh it against your approach.
This requires a bit more work on the test runners side (e.g. a bash script
that reruns ginkgo until it exits 0) but it does leave a better trail of
breadcrumbs and will be harder for folks to abuse.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#261 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAnglg5bZGWmjJS8KXY2MTj8ze9mQNI8ks5qUbyagaJpZM4I7TBP
.

tedsuo · 2016-07-11T06:38:11Z

FWIW, there are ways to wed the two approaches without any extra mechanisms:

ginkgo --writeReport=path/to/file.json
ginkgo --writeReport=path/to/file.json --retryFailures=2
ginkgo --retryReport=path/to/file.json --retryFailures=1

ginkgo --writeReport=path/to/file.json [[--retryFailures=0]]

writeReport writes a Ginkgo Report File just before calling AfterSuite. It contains test telemetry. If ginkgo exits with success, the report contains details of a successful pass. If any tests fail, the report records the reason and trace for each failure.

If retryFailures is set to a positive integer, failures in the report are then consumed.

ginkgo --retryReport=path/to/file.json [[--retryFailures=0]]

retryReport loads a Ginkgo Report File. If any failures were recorded, BeforeSuite is called, and the recorded failures are rerun according to the behavior of retryFailures.

lavalamp · 2016-07-11T20:56:08Z

I guess I don't understand what that gains us-- can't I just do that now by parsing the junit file and rerunning the failures?

onsi · 2016-07-16T05:17:30Z

hey @lavalamp am going to pull this in. so you mind writing up some docs for the gh-pages branch when you have a moment?

lavalamp · 2016-07-16T05:39:06Z

Thanks! I will send you some docs.

Automatic merge from submit-queue Detect flakes in PR builder e2e runs Won't be mergable until onsi/ginkgo#261 is agreed upon and merged. Tossing a PR here to get the e2e test to run on it.

add flaky test mitigation

cd831bb

This was referenced Jun 22, 2016

Detect flakes in PR builder e2e runs kubernetes/kubernetes#27898

Merged

Flake mitigation #260

Closed

Add integration test for flake flags

72904e4

lavalamp changed the title ~~WIP: add flaky test mitigation~~ Add flaky test mitigation Jun 29, 2016

Add flaky test args to BuildFlagArgs

5d6142c

Remove requiring more than one pass

82a8f33

Better documentation

e7b727a

add concept of flake to spec and suite summary

6075367

onsi merged commit 7d32401 into onsi:master Jul 16, 2016

Random-Liu mentioned this pull request Jul 21, 2016

Bump cadvisor dependencies to latest head. kubernetes/kubernetes#29389

Closed

ixdy mentioned this pull request Aug 31, 2016

[node e2e] Confusing result: 1 Failure, 0 Failed kubernetes/kubernetes#31802

Closed

cirocosta mentioned this pull request Oct 24, 2019

add k8s-topgun tag and IN_CLUSTER param to k8s-topgun concourse/ci#191

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add flaky test mitigation #261

Add flaky test mitigation #261

lavalamp commented Jun 22, 2016

tedsuo commented Jun 28, 2016

lavalamp commented Jun 29, 2016

lavalamp commented Jun 29, 2016

tedsuo commented Jun 29, 2016

lavalamp commented Jun 29, 2016

lavalamp commented Jun 29, 2016

lavalamp commented Jun 30, 2016

onsi commented Jul 1, 2016

lavalamp commented Jul 2, 2016

lavalamp commented Jul 2, 2016

lavalamp commented Jul 6, 2016

lavalamp commented Jul 9, 2016

onsi commented Jul 11, 2016

lavalamp commented Jul 11, 2016

tedsuo commented Jul 11, 2016

lavalamp commented Jul 11, 2016

onsi commented Jul 16, 2016

lavalamp commented Jul 16, 2016

Add flaky test mitigation #261

Add flaky test mitigation #261

Conversation

lavalamp commented Jun 22, 2016

tedsuo commented Jun 28, 2016

lavalamp commented Jun 29, 2016

lavalamp commented Jun 29, 2016

tedsuo commented Jun 29, 2016

lavalamp commented Jun 29, 2016

lavalamp commented Jun 29, 2016

lavalamp commented Jun 30, 2016

onsi commented Jul 1, 2016

lavalamp commented Jul 2, 2016

lavalamp commented Jul 2, 2016

lavalamp commented Jul 6, 2016

lavalamp commented Jul 9, 2016

onsi commented Jul 11, 2016

lavalamp commented Jul 11, 2016

tedsuo commented Jul 11, 2016

ginkgo --writeReport=path/to/file.json [[--retryFailures=0]]

ginkgo --retryReport=path/to/file.json [[--retryFailures=0]]

lavalamp commented Jul 11, 2016

onsi commented Jul 16, 2016

lavalamp commented Jul 16, 2016