Establish process for caring for non-merge-queue test suites #18116

ikehz · 2015-12-03T00:10:53Z

We should have three different kinds of test suites: 'merge-queue', 'critical builds', and 'other'. Build cop is responsible for the first two.

Action items:

What is the definition of a critical build?
- Write tools to enforce this state of affairs
Establish how to find the owner of a job and method (@spxtr)
Find and define an “owner” for each test method
Find and define the “owner” for each job (@spxtr)
Take action on soak tests: they should be green for a week
Identify candidates to put into critical (@jlowdermilk)
What is the point of other when no one fixes them?

Assigning to @spxtr to manage/delegate the above.

pmorie · 2015-12-03T21:28:10Z

@kubernetes/rh-cluster-infra

eparis · 2015-12-03T21:42:08Z

I feel like there must be a whole lot of 'why' and backstory which I'm missing. Can you tell me like I'm 5?

ikehz · 2015-12-03T23:04:22Z

Oh, sorry. This is kind of internal Google stuff.

We have test suites that block the merge-bot if they start failing. Those are definitely watched by our build cop. We also have suites that are marked as "critical", but it's not clear what that means. We also have a bunch of suites that aren't marked as "critical". A lot of them are failing consistently.

We just need to establish how to decide when a build is allowed to be "critical" or "merge-blocking", and also how to establish responsibility for the "other" builds so that we don't just have dozens of test suites lying around that aren't providing us any information.

spxtr · 2016-03-03T23:10:40Z

I'm not sure that this is useful as a tracking issue, although the underlying issue still remains. For instance, kubernetes-e2e-gce-flannel has been broken for a long time, but nobody cares. Technically it has an owner, but that doesn't mean the owner needs to fix it anytime soon.

Soak tests are a long-term goal on their own that deserve their own issue.

I don't think we're going to get owners per test method. It might be worth making a policy of adding a comment over each test with the team to contact if it breaks? It's usually pretty obvious from the git history, though.

bprashanth · 2016-03-16T17:36:21Z

I'm not sure that this is useful as a tracking issue, although the underlying issue still remains. For instance, kubernetes-e2e-gce-flannel has been broken for a long time, but nobody cares.

Actually it times out. The suite is still running. The important part with flannel is to prove networking works for the scalability suite (which we've proven by running > 50% of tests and the enormous cluster). I've been meaning to look at the timeouts but didn't have time (that's why it's feature, right?).

bprashanth · 2016-03-16T17:38:28Z

Actually it WAS timing out at one point. Now it's

7fb527513480d8229e363ca62d8548fbeea92cbe)
02:04:33 +++ kubernetes-salt.tar.gz uploaded (sha1 = 890376d5ba609abb84e0416510f4f69cfffa5aee)
02:04:34 Starting master and configuring firewalls
02:04:35 ERROR: (gcloud.compute.firewall-rules.create) Some requests did not succeed:
02:04:35  - The resource 'projects/kubernetes-flannel/global/firewalls/e2e-flannel-master-https' already exists
02:04:35 ERROR: (gcloud.compute.disks.create) Some requests did not succeed:
02:04:35  - The resource 'projects/kubernetes-flannel/zones/us-central1-f/disks/e2e-flannel-master-pd' already exists
02:04:35 2016/03/16 02:04:35 e2e.go:200: Error running up: exit status 1
02:04:35 2016/03/16 02:04:35 e2e.go:196: Step 'up' finished in 14.084397715s
02:04:35

Which is #21564

ikehz added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/test-infra labels Dec 3, 2015

ikehz assigned spxtr Dec 3, 2015

ikehz added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Dec 3, 2015

spxtr mentioned this issue Dec 9, 2015

Add kubernetes-e2e-gce job to git #18454

Merged

spxtr closed this as completed Mar 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Establish process for caring for non-merge-queue test suites #18116

Establish process for caring for non-merge-queue test suites #18116

ikehz commented Dec 3, 2015

pmorie commented Dec 3, 2015

eparis commented Dec 3, 2015

ikehz commented Dec 3, 2015

spxtr commented Mar 3, 2016

bprashanth commented Mar 16, 2016

bprashanth commented Mar 16, 2016

Establish process for caring for non-merge-queue test suites #18116

Establish process for caring for non-merge-queue test suites #18116

Comments

ikehz commented Dec 3, 2015

pmorie commented Dec 3, 2015

eparis commented Dec 3, 2015

ikehz commented Dec 3, 2015

spxtr commented Mar 3, 2016

bprashanth commented Mar 16, 2016

bprashanth commented Mar 16, 2016