Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug fix] TestCascadingDeletion during pull-kubernetes-unit-test flaky #55653

Merged
merged 1 commit into from
Nov 18, 2017

Conversation

hzxuzhonghu
Copy link
Member

What this PR does / why we need it:
fix pull-kubernetes-unit-test flaky.

go test -v k8s.io/kubernetes/test/integration/garbagecollector -run TestCascadingDeletion$
panic: close of closed channel
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x111
/usr/local/go/src/runtime/panic.go:491 +0x283
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:259 +0xb80
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/controller/garbagecollector/garbagecollector.go:123 +0x39
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/controller/garbagecollector/garbagecollector.go:211 +0x20f
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x5e
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/controller/garbagecollector/garbagecollector.go:172 +0xd9
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/integration/garbagecollector/garbage_collector_test.go:268 +0xc9
				from junit_3a3d564eebb1750e5c904cc525d117617fc0af51_20171113-090812.xml

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #55652

Special notes for your reviewer:

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 14, 2017
@dims
Copy link
Member

dims commented Nov 14, 2017

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 14, 2017
@@ -340,6 +341,7 @@ func (gb *GraphBuilder) Run(stopCh <-chan struct{}) {
if monitor.stopCh != nil {
stopped++
close(monitor.stopCh)
monitor.stopCh = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either this or setting gb.monitors to empty at the end of this function will fix the race?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This modification also have data race, because monitor.stopCh is also used by

func (m *monitor) Run() {
	m.controller.Run(m.stopCh)
}

I think your suggestion may be better.

@caesarxuchao
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 14, 2017
@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2017
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 14, 2017
@hzxuzhonghu
Copy link
Member Author

@caesarxuchao Would you tag again please?

@@ -342,6 +342,9 @@ func (gb *GraphBuilder) Run(stopCh <-chan struct{}) {
close(monitor.stopCh)
}
}

// Set monitors to nil to prevent data race with syncMonitors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment is inaccurate. There's no race (as both Run and syncMonitors acquire monitorLock). It's just that syncMonitors will try doing the same monitor teardown work, not expecting anything but itself to close the monitor channels. Your fix is still fine, I just ask that the comment be updated to be less scary. How about "reset monitors so that the graph builder can be safely re-run/synced"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thank you.

@caesarxuchao
Copy link
Member

lgtm after addressing @ironcladlou's comment. Thank you!

@caesarxuchao caesarxuchao added kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. and removed kind/bug Categorizes issue or PR as related to a bug. labels Nov 16, 2017
@hzxuzhonghu
Copy link
Member Author

@caesarxuchao Done.

@hzxuzhonghu
Copy link
Member Author

/test pull-kubernetes-node-e2e

@caesarxuchao
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 17, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: caesarxuchao, hzxuzhonghu

Associated issue: 55652

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@caesarxuchao
Copy link
Member

Failed test is #55917

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

2 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 55908, 55829, 55293, 55653, 55665). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 69098ad into kubernetes:master Nov 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. kind/flake Categorizes issue or PR as related to a flaky test. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[double close channel] TestCascadingDeletion during pull-kubernetes-unit-test flaky
8 participants