-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix limitranger to handle latent caches without live lookups every time #21470
Conversation
Labelling this PR as size/L |
return nil | ||
lruList, ok := l.liveLookupCache.Get(a.GetNamespace()) | ||
if !ok { | ||
liveList, err := l.client.Core().LimitRanges(a.GetNamespace()).List(api.ListOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this prone to being wrong if the LRU cache is never cleared? for example, if you add a limit range to a namespace, and later remove it, the LRU cache could have the wrong data? also what happens if i delete and readd the same namespace, but the new namespace has no limit range? this looks like it could return wrong data in that scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a ttl to the cache entry to timebound the wrongness. One minute ought to do for the majority of cases. Once you're beyond that the locality doesn't help.
I would prefer a flag to admission controller that says to always do a live look-up, and downstream we can run with that flag enabled... that seems less error-prone and brings it fewer dependencies. |
GCE e2e build/test failed for commit 91b2e5ad1753d6afb898944e3b269e62ee3b19bb. |
After in person chat, I see the point that its best to fail by default if LimitRange is viewed as a more formal aspect of the security apparatus. With a TTL on the LRU cache to handle re-using namespace names, this is fine. |
Actually, the TTL is problematic, since it will keep checking for those namespaces that don't use it at all... I guess if that is a major concern, they can choose to not run with the admission controller enabled. |
And the re-lookup boundary would be hit only a couple times per minute, not every request. |
sgtm |
2826422
to
324ece0
Compare
ttl added. |
cool, lgtm. |
GCE e2e build/test failed for commit 2826422ccfc25af04d335bde53655e5b293aab72. |
GCE e2e test build/test passed for commit 324ece0b1bcbee018ce1442ab96a0cdacc1a3314. |
324ece0
to
a34da7f
Compare
GCE e2e build/test failed for commit a34da7f3649c22a591ce004145b514ce26ede60b. |
a34da7f
to
c3ef573
Compare
GCE e2e test build/test passed for commit c3ef5730a977b15cdda336b3bca9048c1f210097. |
c3ef573
to
db8778f
Compare
PR changed after LGTM, removing LGTM. |
GCE e2e test build/test passed for commit db8778fcff6c15af5070093d41a47ad555052780. |
db8778f
to
24d5329
Compare
GCE e2e test build/test passed for commit 24d5329. |
fix for p1 flake |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e test build/test passed for commit 24d5329. |
Automatic merge from submit-queue |
Auto commit by PR queue bot
Fixes #21234
If a limit range and a pod are all created very close together, then the cache for the limitranger will not be up to date and I'll be able to sneak a pod in.
It is too expensive to do a live lookup every time. Instead, and empty answer from the indexer is used to do a lookup in a lru cache of live lookups. If that results in a cache miss, then an actual live listing is done and the lru cache is primed.
This brought in a new LRU cache library which is thread-safe (the existing one is not).