-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed integer overflow bug in rate limiter. #31396
Conversation
// The maximum exponent maxExp is chosen such that the 'calculated' value never | ||
// exceeds MaxInt64. | ||
maxExp := int(math.Log10(float64(math.MaxInt64)/float64(r.baseDelay.Nanoseconds())) - 1) | ||
if exp > maxExp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe just cap it? the probelm is that time.Duration can't handle the float right
backoff := math.Pod10(r.failures[items])
if backoff >= math.MaxInt64 {
return r.maxDelay
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, simplified with an early return.
Just the one nit, thanks |
9811ee2
to
00e4188
Compare
GCE e2e build/test passed for commit 00e4188. |
Review status: 0 of 2 files reviewed at latest revision, 3 unresolved discussions. pkg/util/workqueue/default_rate_limiters.go, line 97 [r2] (raw file):
I am surprised the exponent base is not 2 at the most. 10 seems really really steep (100 ms -> 1 s -> 10 s -> 100 s -> 1000s). pkg/util/workqueue/default_rate_limiters.go, line 98 [r2] (raw file):
I wonder what we should do here. I suspect the item probably shouldn't even be enqueued at this point. FWIW MaxInt64 in ns is ~500 Years. Will probably do a master upgrade before the duration expires :) FWIW, I always liked this rater limiter implementation https://github.com/google/guava/blob/master/guava/src/com/google/common/util/concurrent/RateLimiter.java Comments from Reviewable |
Review status: 0 of 2 files reviewed at latest revision, 4 unresolved discussions. pkg/util/workqueue/default_rate_limiters.go, line 97 [r2] (raw file):
We could potentially just short circuit this whole thing and say if the exp > 20, return or something. Comments from Reviewable |
pkg/util/workqueue/default_rate_limiters.go, line 98 [r2] (raw file):
|
pkg/util/workqueue/default_rate_limiters.go, line 97 [r2] (raw file):
|
I don't have a 2 vs 10 preference. lgtm. Seems like this should be 1.4. |
This may be cherry pick candidate? |
@deads2k I agree that this should be in 1.4. Conviniently I am the release czar that gets to make those decisions :) |
Reviewed 1 of 2 files at r1. Comments from Reviewable |
Yes, 1.3. The symptom seemed like something that could affect us in
1.3 - I thought we converted a few of the core controllers to delaying
wait queues before 1.3 shipped.
|
+lgtm Reviewed 1 of 2 files at r1. Comments from Reviewable |
@foxish Thanks |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
calculated := r.baseDelay * time.Duration(math.Pow10(r.failures[item]-1)) | ||
// The backoff is capped such that 'calculated' value never overflows. | ||
backoff := float64(r.baseDelay.Nanoseconds()) * math.Pow10(exp) | ||
if backoff > math.MaxInt64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please cap exp before the Pow10 instead of doing a float64 comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry didn't catch this last night. Youre implicitly converting maxint64 into a float larger than maxint64. You can early return with something like exp >= int64(math.Log10(math.MaxInt64)) instaed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. Will follow up with another PR fixing this.
GCE e2e build/test passed for commit 00e4188. |
Automatic merge from submit-queue |
Automatic merge from submit-queue Bump glbc to 0.8.0 Don't think this hits the bar for 1.4.0, but hopefully it can make 1.4.1. The version bump is for the godep update that fixes an issue with the throttling workqueue (kubernetes/kubernetes#31396). I should've done this sooner, dropped it. Also fixes #1776 and #1783
Automatic merge from submit-queue Bump glbc to 0.8.0 Don't think this hits the bar for 1.4.0, but hopefully it can make 1.4.1. The version bump is for the godep update that fixes an issue with the throttling workqueue (kubernetes/kubernetes#31396). I should've done this sooner, dropped it. Also fixes kubernetes-retired#1776 and kubernetes-retired#1783
This PR fixes a bug in the delayed work-queue used by some controllers.
The integer overflow bug would previously cause hotlooping behavior after a few failures
as
time.Duration(..)
on values larger than MaxInt64 behaves unpredictably, andafter a certain value returns 0 always.
cc @bprashanth @pwittrock
This change is