Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error handling in gc e2e test #60671

Merged

Conversation

jennybuckley
Copy link

What this PR does / why we need it:
Error messages were not being surfaced in log of GC e2e test, part of fixing #60463

@k8s-ci-robot
Copy link
Contributor

@jennybuckley: Adding do-not-merge/release-note-label-needed because the release note process has not been followed.

One of the following labels is required "release-note", "release-note-action-required", or "release-note-none".
Please see: https://git.k8s.io/community/contributors/devel/pull-requests.md#write-release-notes-if-needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 1, 2018
@k8s-ci-robot k8s-ci-robot requested review from fejta and janetkuo March 1, 2018 22:43
@caesarxuchao
Copy link
Member

/lgtm
/approval

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 1, 2018
@caesarxuchao
Copy link
Member

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 1, 2018
@lavalamp lavalamp added this to the v1.10 milestone Mar 1, 2018
@lavalamp
Copy link
Member

lavalamp commented Mar 1, 2018

If this weren't a test I'd ask for this to use the error list concept that (I think) the validation code defines (uses?) :)

@jennybuckley
Copy link
Author

/kind bug
/priority important-soon
/sig api-machinery

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Mar 2, 2018
@@ -610,19 +611,20 @@ var _ = SIGDescribe("Garbage collector", func() {
return verifyRemainingDeploymentsReplicaSetsPods(f, clientSet, deployment, 0, 1, 2)
})
if err != nil {
err = fmt.Errorf("Failed to wait to see if the garbage collecter mistakenly deletes the rs: %v", err)
errList := err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accumulating into list would probably make more sense than passing it back into fmt.Errorf

@@ -553,13 +553,14 @@ var _ = SIGDescribe("Garbage collector", func() {
return verifyRemainingDeploymentsReplicaSetsPods(f, clientSet, deployment, 0, 0, 0)
})
if err == wait.ErrWaitTimeout {
err = fmt.Errorf("Failed to wait for all rs to be garbage collected: %v", err)
errList := err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here... accumulate?

@jennybuckley jennybuckley force-pushed the gc-test-error-message branch from 6dfc643 to af684f9 Compare March 6, 2018 22:44
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 6, 2018
err = wait.PollImmediate(5*time.Second, 2*time.Minute, func() (bool, error) {
return verifyRemainingDeploymentsReplicaSetsPods(f, clientSet, deployment, 0, 1, 2)
})
By("wait for 10s to see if the garbage collector mistakenly deletes the rs")
Copy link
Author

@jennybuckley jennybuckley Mar 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is changed because it doesn't make sense to use PollImmediate to make sure something doesn't happen (at least the way it was being used here). If it hasn't happened by the first polling the PollImmediate will succeed and then what we are watching for may end up occurring later and we will miss it. How we were doing it, the test would only wait either ~0, or 5 seconds (depending on how long the deletion of the Deployment took, usually the 0s case), so this is actually an increase in wait time despite the log message diff indicating a decrease.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jennybuckley
Copy link
Author

jennybuckley commented Mar 6, 2018

@liggitt I updated to use aggregated errors and also to fix two more places where errors could be missed and the test could pass when it should fail

@@ -552,15 +552,17 @@ var _ = SIGDescribe("Garbage collector", func() {
err = wait.PollImmediate(500*time.Millisecond, 1*time.Minute, func() (bool, error) {
return verifyRemainingDeploymentsReplicaSetsPods(f, clientSet, deployment, 0, 0, 0)
})
if err == wait.ErrWaitTimeout {
err = fmt.Errorf("Failed to wait for all rs to be garbage collected: %v", err)
if err != nil {
Copy link
Author

@jennybuckley jennybuckley Mar 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix for error swallowing missed by f08f504

@jennybuckley jennybuckley force-pushed the gc-test-error-message branch from ece5a70 to 218199c Compare March 7, 2018 18:59
})
if err != nil {
err = fmt.Errorf("Failed to wait to see if the garbage collecter mistakenly deletes the rs: %v", err)
By("wait for 30 seconds to see if the garbage collector mistakenly deletes the rs")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why poll at all? why not just sleep 30 seconds, then list and ensure length is 1? expecting to get a timeout error is confusing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt
that is how I had it but changed it in favor of consistency with other similar code in this file
#60671 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jennybuckley your code is better, let's change the rest of code to comply with yours.

@jdumars
Copy link
Member

jdumars commented Mar 13, 2018

@sig-testing-approvers PTAL ASAP

@liggitt
Copy link
Member

liggitt commented Mar 14, 2018

this is an improvement, but is it a release blocker?

@jennybuckley jennybuckley force-pushed the gc-test-error-message branch from 218199c to 5f518e6 Compare March 14, 2018 17:29
@jennybuckley
Copy link
Author

@liggitt no, I don't think it's a release blocker

@liggitt
Copy link
Member

liggitt commented Mar 14, 2018

let's move it out of the milestone, then. we're down to the wire on 1.10 and only essential things should be in at this point

/milestone clear

@k8s-ci-robot k8s-ci-robot removed this from the v1.10 milestone Mar 14, 2018
@jennybuckley
Copy link
Author

/retest

@fejta fejta removed their request for review March 30, 2018 18:52
@jennybuckley
Copy link
Author

/test pull-kubernetes-e2e-gce

@jennybuckley
Copy link
Author

jennybuckley commented Jun 1, 2018

@liggitt @roycaihw
Maybe this would be useful for debugging the new GC flakes. it seemed less important once the gc flakes died down a while ago, but it looks like they are back.

return false, nil
}); err != nil && err != wait.ErrWaitTimeout {
framework.Failf("%v", err)
time.Sleep(30 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: polling helps us fail earlier in case of error. but in reality we expect these tests to pass consistently, so there is no difference

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a while ago, but I think the reason for that change was to keep consistency with the change I made to https://github.com/kubernetes/kubernetes/pull/60671/files#diff-4a7d8678f8505e86c1c6cc626cdc5eeeL608
In that case the way the polling was set up was completely broken, and it would always succeed within the first 0-5 seconds, even if the garbage collector mistakenly deleted the replica set.

Discussion about it here: #60671 (comment)

@roycaihw
Copy link
Member

roycaihw commented Jun 1, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 1, 2018
@liggitt
Copy link
Member

liggitt commented Jun 1, 2018

/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: caesarxuchao, jennybuckley, liggitt, roycaihw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 1, 2018
@liggitt
Copy link
Member

liggitt commented Jun 1, 2018

/retest

@liggitt
Copy link
Member

liggitt commented Jun 5, 2018

@k8s-ci-robot k8s-ci-robot added this to the v1.11 milestone Jun 5, 2018
@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@caesarxuchao @jennybuckley @liggitt @roycaihw

Pull Request Labels
  • sig/api-machinery: Pull Request will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the pull request owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/bug: Fixes a bug discovered during the current release.
Help

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@jennybuckley
Copy link
Author

/retest

@liggitt
Copy link
Member

liggitt commented Jun 5, 2018

fyi, fix in progress for that flake in #64726

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 6c89575 into kubernetes:master Jun 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants