-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use btree for watch cache storage to serve LIST more efficiently #128415
Conversation
/sig api-machinery |
func newStoreIndexer(indexers *cache.Indexers) storeIndexer { | ||
return cache.NewIndexer(storeElementKey, storeElementIndexers(indexers)) | ||
if utilfeature.DefaultFeatureGate.Enabled(features.BtreeWatchCache) { | ||
return newThreadedBtreeStoreIndexer(storeElementIndexers(indexers), 32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's highlight 32
as a constant please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DOne
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you! especially for adding more notes on how you got that number
083858f
to
5b6f39c
Compare
5b6f39c
to
4b80c95
Compare
/test pull-kubernetes-e2e-gce-100-performance |
/triage accepted |
Great to see this new feature! Thanks @serathius. |
Changelog suggestion -Use btree for watch cache storage to serve LIST more efficiently. Can be disabled via BtreeWatchCache feature flag.
+Adopted a new implementation of watch caches for **list** verbs, using a btree data structure. The new implementation is active by default; you can opt out by disabling the `BtreeWatchCache` feature gate. In docs we write HTTP-level verbs (eg POST) in all-uppercase, and API-level logical verbs (eg create) in bold lower case. |
@@ -96,6 +96,12 @@ const ( | |||
// This feature is currently PRE-ALPHA and MUST NOT be enabled outside of integration tests. | |||
TestOnlyCBORServingAndStorage featuregate.Feature = "TestOnlyCBORServingAndStorage" | |||
|
|||
// owner: @serathius | |||
// beta: v1.32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer add information about stages here, as this is now easy to get from VersionedSpec below.
But please add a link to kep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no KEP, got agreement from api machinery leads to move forward with btree without KEP until we want to extend caching to more API surface.
"k8s.io/client-go/tools/cache" | ||
) | ||
|
||
const ( | ||
// btreeDegree defines the degree of btree storage. | ||
// Determined based on running BenchmarkStore, higher values bring diminishing benefits for LIST, while increasing cost of updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How the difference looks between e.g. 16 and 32 then?
Can you provide a bit more detail on how exactly this value was determined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I selected 32 based during original PoC for serving exact requests from watch cache. If I remember correctly the benchmark showed that snapshotting btree gets more and more costly as degree, and 32 was a nice balance between them. It also nicely matched the degree of btree used by etcd.
Tried to repeat the tests assuming only the current state of the code however, as we don't snapshot btree there no negative factor of using the higher degree. Increasing the degree just brings diminishing improvements.
│ btree-8.txt │ btree-16.txt │ btree-32.txt │ btree-64.txt │
│ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │
StoreListCreate/RV=NotOlderThan-12 238.2µ ± 3% 231.5µ ± 3% -2.82% (p=0.015 n=10) 224.1µ ± 4% -5.92% (p=0.001 n=10) 221.1µ ± 2% -7.16% (p=0.000 n=10)
StoreList/RV=NotOlderThan-12 679.3µ ± 2% 682.3µ ± 1% ~ (p=0.579 n=10) 682.3µ ± 2% ~ (p=0.796 n=10) 677.1µ ± 2% ~ (p=0.247 n=10)
StoreList/Paginate/RV=NotOlderThan-12 16.47m ± 1% 16.51m ± 1% ~ (p=0.853 n=10) 16.55m ± 1% ~ (p=0.052 n=10) 16.43m ± 1% ~ (p=0.393 n=10)
StoreList/Namespace/RV=NotOlderThan-12 973.0µ ± 5% 978.2µ ± 4% ~ (p=0.631 n=10) 976.1µ ± 3% ~ (p=1.000 n=10) 959.4µ ± 3% ~ (p=0.436 n=10)
geomean 1.269m 1.264m -0.42% 1.254m -1.22% 1.239m -2.34%
│ btree-8.txt │ btree-16.txt │ btree-32.txt │ btree-64.txt │
│ B/op │ B/op vs base │ B/op vs base │ B/op vs base │
StoreListCreate/RV=NotOlderThan-12 239.6Ki ± 7% 241.1Ki ± 6% ~ (p=0.971 n=10) 242.6Ki ± 3% ~ (p=0.436 n=10) 241.3Ki ± 3% ~ (p=0.912 n=10)
StoreList/RV=NotOlderThan-12 6.007Mi ± 0% 6.007Mi ± 0% ~ (p=0.724 n=10) 6.007Mi ± 0% ~ (p=0.739 n=10) 6.007Mi ± 0% ~ (p=0.839 n=10)
StoreList/Paginate/RV=NotOlderThan-12 64.19Mi ± 0% 64.19Mi ± 0% ~ (p=0.971 n=10) 64.20Mi ± 0% ~ (p=0.123 n=10) 64.19Mi ± 0% ~ (p=0.796 n=10)
StoreList/Namespace/RV=NotOlderThan-12 6.523Mi ± 1% 6.523Mi ± 1% ~ (p=0.796 n=10) 6.524Mi ± 1% ~ (p=0.093 n=10) 6.523Mi ± 1% ~ (p=0.481 n=10)
geomean 4.925Mi 4.933Mi +0.16% 4.941Mi +0.31% 4.934Mi +0.18%
│ btree-8.txt │ btree-16.txt │ btree-32.txt │ btree-64.txt │
│ allocs/op │ allocs/op vs base │ allocs/op vs base │ allocs/op vs base │
StoreListCreate/RV=NotOlderThan-12 559.0 ± 0% 559.0 ± 0% ~ (p=1.000 n=10) 559.0 ± 0% ~ (p=0.635 n=10) 558.5 ± 0% -0.09% (p=0.029 n=10)
StoreList/RV=NotOlderThan-12 87.50 ± 1% 87.50 ± 1% ~ (p=1.000 n=10) 87.00 ± 1% ~ (p=0.350 n=10) 87.00 ± 1% ~ (p=1.000 n=10)
StoreList/Paginate/RV=NotOlderThan-12 492.9k ± 0% 492.7k ± 0% -0.04% (p=0.023 n=10) 492.9k ± 0% ~ (p=0.542 n=10) 492.7k ± 0% -0.05% (p=0.000 n=10)
StoreList/Namespace/RV=NotOlderThan-12 6.214k ± 3% 6.213k ± 3% ~ (p=0.869 n=10) 6.219k ± 2% ~ (p=0.100 n=10) 6.210k ± 3% ~ (p=0.445 n=10)
geomean 3.499k 3.498k -0.01% 3.494k -0.12% 3.492k -0.19%
What would you preferred way of documenting this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would first want to understand how it was even chosen.
The way I look at it, I see a diminishing return even when going to 16 and 32, not just 64.
Why do you claim 32 as a sweet spot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now my only remaining question in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeated the tests
│ 2 │ 4 │ 8 │ 16 │ 32 │ 64 │ 128 │
│ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │ sec/op vs base │ sec/op vs base │ sec/op vs base │
StoreCreateList/RV=NotOlderThan-24 473.0µ ± 11% 430.1µ ± 9% -9.08% (p=0.005 n=10) 427.9µ ± 6% -9.54% (p=0.002 n=10) 403.9µ ± 8% -14.62% (p=0.000 n=10) 401.0µ ± 4% -15.22% (p=0.000 n=10) 408.0µ ± 4% -13.75% (p=0.000 n=10) 385.9µ ± 4% -18.42% (p=0.000 n=10)
StoreCreateList/RV=ExactMatch-24 604.7µ ± 4% 596.7µ ± 8% ~ (p=0.529 n=10) 604.6µ ± 4% ~ (p=0.971 n=10) 601.1µ ± 4% ~ (p=0.853 n=10) 611.0µ ± 6% ~ (p=0.105 n=10) 598.2µ ± 5% ~ (p=0.579 n=10) 608.2µ ± 3% ~ (p=0.796 n=10)
StoreList/List=All/Paginate=False/RV=Empty-24 729.1µ ± 5% 692.9µ ± 3% -4.96% (p=0.002 n=10) 693.7µ ± 3% -4.86% (p=0.000 n=10) 688.3µ ± 1% -5.59% (p=0.000 n=10) 690.4µ ± 5% -5.31% (p=0.002 n=10) 689.7µ ± 2% -5.40% (p=0.000 n=10) 687.8µ ± 3% -5.67% (p=0.000 n=10)
StoreList/List=All/Paginate=True/RV=Empty-24 19.51m ± 2% 19.84m ± 2% ~ (p=0.105 n=10) 19.89m ± 3% ~ (p=0.190 n=10) 19.64m ± 4% ~ (p=0.853 n=10) 19.34m ± 4% ~ (p=0.481 n=10) 20.22m ± 4% +3.66% (p=0.007 n=10) 19.58m ± 4% ~ (p=0.912 n=10)
StoreList/List=Namespace/Paginate=False/RV=Empty-24 1.672m ± 4% 1.635m ± 2% ~ (p=0.247 n=10) 1.673m ± 5% ~ (p=0.631 n=10) 1.657m ± 2% ~ (p=0.971 n=10) 1.656m ± 4% ~ (p=0.739 n=10) 1.678m ± 2% ~ (p=0.631 n=10) 1.718m ± 8% ~ (p=0.105 n=10)
geomean 1.467m 1.420m -3.24% 1.430m -2.58% 1.403m -4.38% 1.402m -4.46% 1.417m -3.44% 1.403m -4.41%
│ 2 │ 4 │ 8 │ 16 │ 32 │ 64 │ 128 │
│ B/op │ B/op vs base │ B/op vs base │ B/op vs base │ B/op vs base │ B/op vs base │ B/op vs base │
StoreCreateList/RV=NotOlderThan-24 98.58Ki ± 11% 101.33Ki ± 13% ~ (p=0.280 n=10) 99.80Ki ± 26% ~ (p=0.353 n=10) 109.63Ki ± 9% ~ (p=0.075 n=10) 112.56Ki ± 6% +14.18% (p=0.007 n=10) 114.41Ki ± 10% +16.05% (p=0.003 n=10) 115.06Ki ± 12% +16.72% (p=0.011 n=10)
StoreCreateList/RV=ExactMatch-24 117.1Ki ± 0% 117.5Ki ± 0% ~ (p=0.218 n=10) 116.9Ki ± 0% ~ (p=0.052 n=10) 117.3Ki ± 0% ~ (p=0.353 n=10) 116.9Ki ± 0% ~ (p=0.075 n=10) 117.0Ki ± 0% ~ (p=0.436 n=10) 117.0Ki ± 0% ~ (p=0.280 n=10)
StoreList/List=All/Paginate=False/RV=Empty-24 6.023Mi ± 0% 6.024Mi ± 0% +0.01% (p=0.037 n=10) 6.024Mi ± 0% ~ (p=0.493 n=10) 6.024Mi ± 0% +0.01% (p=0.035 n=10) 6.024Mi ± 0% ~ (p=0.247 n=10) 6.024Mi ± 0% ~ (p=0.247 n=10) 6.024Mi ± 0% ~ (p=0.315 n=10)
StoreList/List=All/Paginate=True/RV=Empty-24 64.22Mi ± 0% 64.21Mi ± 0% ~ (p=0.075 n=10) 64.23Mi ± 0% ~ (p=0.280 n=10) 64.21Mi ± 0% -0.02% (p=0.002 n=10) 64.22Mi ± 0% ~ (p=0.579 n=10) 64.22Mi ± 0% ~ (p=0.971 n=10) 64.22Mi ± 0% ~ (p=1.000 n=10)
StoreList/List=Namespace/Paginate=False/RV=Empty-24 8.177Mi ± 0% 8.178Mi ± 0% ~ (p=0.579 n=10) 8.177Mi ± 0% ~ (p=0.971 n=10) 8.179Mi ± 0% ~ (p=0.579 n=10) 8.178Mi ± 0% ~ (p=0.739 n=10) 8.179Mi ± 0% ~ (p=0.315 n=10) 8.176Mi ± 0% ~ (p=0.247 n=10)
geomean 2.034Mi 2.047Mi +0.61% 2.039Mi +0.22% 2.079Mi +2.19% 2.088Mi +2.66% 2.095Mi +3.01% 2.098Mi +3.12%
│ 2 │ 4 │ 8 │ 16 │ 32 │ 64 │ 128 │
│ allocs/op │ allocs/op vs base │ allocs/op vs base │ allocs/op vs base │ allocs/op vs base │ allocs/op vs base │ allocs/op vs base │
StoreCreateList/RV=NotOlderThan-24 560.0 ± 0% 558.0 ± 0% -0.36% (p=0.000 n=10) 557.0 ± 0% -0.54% (p=0.000 n=10) 558.0 ± 0% -0.36% (p=0.000 n=10) 557.0 ± 0% -0.54% (p=0.000 n=10) 557.0 ± 0% -0.54% (p=0.000 n=10) 557.0 ± 0% -0.54% (p=0.000 n=10)
StoreCreateList/RV=ExactMatch-24 871.0 ± 0% 870.0 ± 0% -0.11% (p=0.038 n=10) 870.0 ± 0% -0.11% (p=0.004 n=10) 870.0 ± 0% -0.11% (p=0.005 n=10) 869.0 ± 0% -0.23% (p=0.000 n=10) 870.0 ± 0% -0.11% (p=0.001 n=10) 870.0 ± 0% -0.11% (p=0.000 n=10)
StoreList/List=All/Paginate=False/RV=Empty-24 351.0 ± 3% 358.0 ± 1% +1.99% (p=0.034 n=10) 352.5 ± 3% ~ (p=0.589 n=10) 358.5 ± 1% +2.14% (p=0.022 n=10) 356.5 ± 3% ~ (p=0.208 n=10) 355.0 ± 3% ~ (p=0.224 n=10) 355.0 ± 3% ~ (p=0.183 n=10)
StoreList/List=All/Paginate=True/RV=Empty-24 494.4k ± 0% 494.4k ± 0% ~ (p=0.424 n=10) 494.6k ± 0% +0.06% (p=0.000 n=10) 492.7k ± 0% -0.34% (p=0.000 n=10) 494.5k ± 0% +0.02% (p=0.009 n=10) 493.0k ± 0% -0.28% (p=0.000 n=10) 494.4k ± 0% ~ (p=0.424 n=10)
StoreList/List=Namespace/Paginate=False/RV=Empty-24 32.43k ± 0% 32.44k ± 0% ~ (p=0.579 n=10) 32.43k ± 0% ~ (p=0.971 n=10) 32.45k ± 0% ~ (p=0.517 n=10) 32.44k ± 0% ~ (p=0.670 n=10) 32.46k ± 0% ~ (p=0.256 n=10) 32.41k ± 0% ~ (p=0.247 n=10)
geomean 4.872k 4.887k +0.31% 4.870k -0.03% 4.885k +0.28% 4.880k +0.17% 4.875k +0.06% 4.876k +0.08%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark point to both 16, 32 and 128 to provide the best result. I think 128 is some kind of anomaly due to lucky memory aligment and think is too large. There is no much difference between 16 and 32. I leaned towards 32 as it's the same degree as etcd uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know etcd uses 32 - it just sounds a bit high to me.
Based on the results above, I would personally choose 16, but I don't have super strong opinion here.
Let's just document that in the PR description then (copy the benchmark results and rationale why we choose the value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ab8859d
to
7a8de88
Compare
Can be disabled via BtreeWatchCache feature flag.
7a8de88
to
5ea427e
Compare
/lgtm |
LGTM label has been added. Git tree hash: 12a2d62ba5e7b4a32da21ee5ed81f0ffe1236bb6
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: serathius, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
cc @MadhavJivrajani @deads2k @jpbetz @wojtek-t We got confirmation from scalability tests about the performance improvement |
@serathius Very Very Cool! thanks @MadhavJivrajani and folks! |
This is AMAZING! |
Follow up from #126754