-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix replica set hot loop #33092
fix replica set hot loop #33092
Conversation
Can a kubernetes member verify that this patch is reasonable to test? If so, please reply with "@k8s-bot ok to test" on its own line. Regular contributors should join the org to skip this step. While we transition away from the Jenkins GitHub PR Builder plugin, "ok to test" commenters will need to be on the admin list defined in this file. |
@deads2k @derekwaynecarr PTAL. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks good. Some small comments.
glog.Errorf("Error syncing ReplicaSet: %v", err) | ||
} | ||
}() | ||
for rsc.processNextWorkItem() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait.Forever(rsc.processNextWorkItem, 0)
or
for {
rsc.processNextWorkItem()
}
might read better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm...
for rsc.processNextWorkItem() {
}
is not a dead loop. Since processNextWorkItem()
may return false and will end the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, thanks for pointing that out.
return true | ||
} | ||
|
||
utilruntime.HandleError(fmt.Errorf("Sync %v failed with %v", key, err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use '%q' for key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will fix.
if err != nil { | ||
return err | ||
} | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks not formated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will go fmt code
} | ||
return nil | ||
// return manageReplicasErr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please delete commented out code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will do.
1d063d7
to
e80ef7b
Compare
@mikedanese comments addressed, PTAL. Thanks! |
rsc.expectations.DeleteExpectations(key) | ||
return nil | ||
} | ||
if err != nil { | ||
glog.Infof("Unable to retrieve ReplicaSet %v from store: %v", key, err) | ||
rsc.queue.Add(key) | ||
glog.V(4).Infof("Unable to retrieve ReplicaSet %v from store: %v", key, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error will be logged anyway in processNextWorkItem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, will fix.
utilruntime.HandleError(fmt.Errorf("Sync %q failed with %v", key, err)) | ||
rsc.queue.AddRateLimited(key) | ||
|
||
return true | ||
} | ||
|
||
// manageReplicas checks and updates replicas for the given ReplicaSet. | ||
// Does NOT modify <filteredPods>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth mentioning that we will requeue the replica set now in case of an error while creating / deleting pods. Seems that we didn't do it prior to this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I agree with you and I will add some lines comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
e80ef7b
to
6f1cce7
Compare
@Kargakis comments addressed. PTAL. |
LGTM, pending a review from @mikedanese |
pkg/controller/replicaset/replica_set_test.go:1157: possible formatting directive in Fatal call exit status 1 Makefile:99: recipe for target 'verify' failed make: *** [verify] Error 1 |
6f1cce7
to
fc0266b
Compare
Jenkins GCI Kubemark GCE e2e failed for commit fc0266b. Full PR test history. The magic incantation to run this job again is |
go run hack/e2e.go -v -test --test_args='--ginkgo.focus=Up$' error running up: exit status 1 from junit_runner.xml |
please verify that replication controller also doesn't suffer from the same issue (I haven't opened the pr, just noticed that files touched didn't include replication_conroller.go) |
I think @deads2k has fixed replication controller hot loop? |
yes On Sat, Sep 24, 2016 at 4:55 AM, DuJun notifications@github.com wrote:
|
kubernetes-pull-build-test-gci-kubemark-e2e-gce #20 Recent runs @wojtek-t kubemark test failed(Perhaps it's no my fault), please take a look. |
CI failed by flake issue #33459 Can anyone please ping rebot? |
oh
I guess we're leaking networks? @kubernetes/goog-testing (networks is a rare leak, I'll clean them up but the source is unknown) |
@k8s-bot test this github issue: #IGNORE |
Ah, this failure looks more familiar: #33617 |
@bprashanth can we move forward with merging the PR since the change seems unrelated to the flake? I would like to have this merged and add Conditions on top for both controllers. |
Jenkins GKE smoke e2e failed for commit fc0266b. Full PR test history. The magic incantation to run this job again is |
How can we restart the Jenkins GCI Kubemark GCE e2e suite? |
} | ||
return nil | ||
return utilerrors.NewAggregate(errlist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RC controller is not aggregating errors but instead it will return them separately: there are only two of them anyway, one error representing a replica create/delete failure and one about a failed RC status update. In the first case it will retry forever albeit being rate-limited and in the second case it will only retry once more. There is no reason for not having both controllers being consistent as far as error handling is concerned. I am not sure though if the RC controller is 100% correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually there is no difference on a second look I had. Still I would prefer to follow the same pattern as the RC controller and not aggregate here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your advice, I will have a deep look at RC controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be an easy fix: you basically don't aggregate here and return just the manageReplicas error. In the if clause above you return the status update error instead of appending to the list.
fc0266b
to
41ecaaf
Compare
Change-Id: I5176eb9350324de8e7b2035686c4e2c2abc5ef3d
41ecaaf
to
a76a743
Compare
Sorry for delay, comments addressed. PTAL. Thanks! |
Jenkins GCI GCE e2e failed for commit a76a743. Full PR test history. The magic incantation to run this job again is |
Jenkins Kubemark GCE e2e failed for commit a76a743. Full PR test history. The magic incantation to run this job again is |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
What this PR does / why we need it:
Fix replicas set hot loop.
Related issue: #30629
This change is