Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix replica set hot loop #33092

Merged

Conversation

m1093782566
Copy link
Contributor

@m1093782566 m1093782566 commented Sep 20, 2016

What this PR does / why we need it:

Fix replicas set hot loop.

Related issue: #30629


This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

Can a kubernetes member verify that this patch is reasonable to test? If so, please reply with "@k8s-bot ok to test" on its own line.

Regular contributors should join the org to skip this step.

While we transition away from the Jenkins GitHub PR Builder plugin, "ok to test" commenters will need to be on the admin list defined in this file.

@m1093782566
Copy link
Contributor Author

@deads2k @derekwaynecarr PTAL.

Thanks!

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Sep 20, 2016
Copy link
Member

@mikedanese mikedanese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good. Some small comments.

glog.Errorf("Error syncing ReplicaSet: %v", err)
}
}()
for rsc.processNextWorkItem() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait.Forever(rsc.processNextWorkItem, 0)

or

for {
    rsc.processNextWorkItem()
}

might read better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...

for rsc.processNextWorkItem() {
}

is not a dead loop. Since processNextWorkItem() may return false and will end the loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, thanks for pointing that out.

return true
}

utilruntime.HandleError(fmt.Errorf("Sync %v failed with %v", key, err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use '%q' for key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will fix.

if err != nil {
return err
}
default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks not formated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will go fmt code

}
return nil
// return manageReplicasErr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please delete commented out code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will do.

@derekwaynecarr derekwaynecarr added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Sep 20, 2016
@m1093782566 m1093782566 force-pushed the m109-fix-rs-hostloop branch 2 times, most recently from 1d063d7 to e80ef7b Compare September 21, 2016 02:58
@m1093782566
Copy link
Contributor Author

@mikedanese comments addressed, PTAL.

Thanks!

rsc.expectations.DeleteExpectations(key)
return nil
}
if err != nil {
glog.Infof("Unable to retrieve ReplicaSet %v from store: %v", key, err)
rsc.queue.Add(key)
glog.V(4).Infof("Unable to retrieve ReplicaSet %v from store: %v", key, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error will be logged anyway in processNextWorkItem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, will fix.

utilruntime.HandleError(fmt.Errorf("Sync %q failed with %v", key, err))
rsc.queue.AddRateLimited(key)

return true
}

// manageReplicas checks and updates replicas for the given ReplicaSet.
// Does NOT modify <filteredPods>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth mentioning that we will requeue the replica set now in case of an error while creating / deleting pods. Seems that we didn't do it prior to this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I agree with you and I will add some lines comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@m1093782566
Copy link
Contributor Author

@Kargakis comments addressed. PTAL.

@0xmichalis
Copy link
Contributor

LGTM, pending a review from @mikedanese

@mikedanese mikedanese added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 22, 2016
@m1093782566
Copy link
Contributor Author

m1093782566 commented Sep 24, 2016

pkg/controller/replicaset/replica_set_test.go:1157: possible formatting directive in Fatal call

exit status 1

Makefile:99: recipe for target 'verify' failed

make: *** [verify] Error 1

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 24, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCI Kubemark GCE e2e failed for commit fc0266b. Full PR test history.

The magic incantation to run this job again is @k8s-bot kubemark gci e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@m1093782566
Copy link
Contributor Author

m1093782566 commented Sep 24, 2016

go run hack/e2e.go -v -test --test_args='--ginkgo.focus=Up$'

error running up: exit status 1 from junit_runner.xml

@bprashanth
Copy link
Contributor

please verify that replication controller also doesn't suffer from the same issue (I haven't opened the pr, just noticed that files touched didn't include replication_conroller.go)

@m1093782566
Copy link
Contributor Author

please verify that replication controller also doesn't suffer from the same issue

I think @deads2k has fixed replication controller hot loop?

@0xmichalis
Copy link
Contributor

yes

On Sat, Sep 24, 2016 at 4:55 AM, DuJun notifications@github.com wrote:

please verify that replication controller also doesn't suffer from the
same issue

I think @deads2k https://github.com/deads2k has fixed replication
controller hot loop?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33092 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADuFfypeyVZT-KznU5txrB-J8tqEfgUbks5qtJEygaJpZM4KBrjv
.

@m1093782566
Copy link
Contributor Author

m1093782566 commented Sep 25, 2016

kubernetes-pull-build-test-gci-kubemark-e2e-gce #20

Recent runs
PR m1093782566: fix replica set hot loop
Result FAILURE
Started 2016-09-24 08:57 CST
Elapsed 14m30s
Version v1.5.0-alpha.0.1316+971567d5f6a819
Builder agent-pr-24

@wojtek-t kubemark test failed(Perhaps it's no my fault), please take a look.

@m1093782566
Copy link
Contributor Author

CI failed by flake issue #33459

Can anyone please ping rebot?

@bprashanth
Copy link
Contributor

oh

WARNING: You are creating a legacy network. Using --mode=legacy will be required in future releases.
ERROR: (gcloud.compute.networks.create) Some requests did not succeed:
 - Quota 'NETWORKS' exceeded. Limit: 5.0

I guess we're leaking networks? @kubernetes/goog-testing (networks is a rare leak, I'll clean them up but the source is unknown)

@bprashanth
Copy link
Contributor

@k8s-bot test this github issue: #IGNORE

@bprashanth
Copy link
Contributor

Ah, this failure looks more familiar: #33617
I think today's just a bad day for flakes. Going to wait a bit before kicking it. If someone doesn't get to debuggin the issue, I probably will at some point, and then we can restart the test.

@0xmichalis
Copy link
Contributor

@bprashanth can we move forward with merging the PR since the change seems unrelated to the flake? I would like to have this merged and add Conditions on top for both controllers.

@bprashanth
Copy link
Contributor

@k8s-bot test this github issue: #33617

@k8s-ci-robot
Copy link
Contributor

Jenkins GKE smoke e2e failed for commit fc0266b. Full PR test history.

The magic incantation to run this job again is @k8s-bot gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@0xmichalis
Copy link
Contributor

@k8s-bot test this issue: #33388

@0xmichalis
Copy link
Contributor

@k8s-bot gci test this issue: #33388

@0xmichalis
Copy link
Contributor

How can we restart the Jenkins GCI Kubemark GCE e2e suite?

}
return nil
return utilerrors.NewAggregate(errlist)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RC controller is not aggregating errors but instead it will return them separately: there are only two of them anyway, one error representing a replica create/delete failure and one about a failed RC status update. In the first case it will retry forever albeit being rate-limited and in the second case it will only retry once more. There is no reason for not having both controllers being consistent as far as error handling is concerned. I am not sure though if the RC controller is 100% correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there is no difference on a second look I had. Still I would prefer to follow the same pattern as the RC controller and not aggregate here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your advice, I will have a deep look at RC controller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be an easy fix: you basically don't aggregate here and return just the manageReplicas error. In the if clause above you return the status update error instead of appending to the list.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 6, 2016
@m1093782566 m1093782566 force-pushed the m109-fix-rs-hostloop branch from fc0266b to 41ecaaf Compare October 8, 2016 07:52
Change-Id: I5176eb9350324de8e7b2035686c4e2c2abc5ef3d
@m1093782566 m1093782566 force-pushed the m109-fix-rs-hostloop branch from 41ecaaf to a76a743 Compare October 8, 2016 08:00
@m1093782566
Copy link
Contributor Author

m1093782566 commented Oct 8, 2016

@Kargakis

Sorry for delay, comments addressed. PTAL.

Thanks!

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 8, 2016
@0xmichalis 0xmichalis added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 8, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GCE e2e failed for commit a76a743. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins Kubemark GCE e2e failed for commit a76a743. Full PR test history.

The magic incantation to run this job again is @k8s-bot kubemark e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit e10f122 into kubernetes:master Oct 8, 2016
@m1093782566 m1093782566 deleted the m109-fix-rs-hostloop branch October 17, 2016 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants