Use btree for watch cache storage to serve LIST more efficiently #128415

serathius · 2024-10-29T15:18:37Z

Adopted a new implementation of watch caches for **list** verbs, using a btree data structure. The new implementation is active by default; you can opt out by disabling the `BtreeWatchCache` feature gate.

Follow up from #126754

serathius · 2024-10-29T15:18:59Z

/sig api-machinery
/kind feature

dims · 2024-10-29T18:35:58Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/store.go

 func newStoreIndexer(indexers *cache.Indexers) storeIndexer {
-	return cache.NewIndexer(storeElementKey, storeElementIndexers(indexers))
+	if utilfeature.DefaultFeatureGate.Enabled(features.BtreeWatchCache) {
+		return newThreadedBtreeStoreIndexer(storeElementIndexers(indexers), 32)


Let's highlight 32 as a constant please?

thank you! especially for adding more notes on how you got that number

serathius · 2024-10-29T19:28:34Z

/test pull-kubernetes-e2e-gce-100-performance

fedebongio · 2024-10-29T20:24:19Z

/triage accepted

mengqiy · 2024-10-30T16:38:50Z

Great to see this new feature! Thanks @serathius.
I wonder if there are any new metrics that have been added. I don't see any in #126754

sftim · 2024-10-31T19:08:52Z

Changelog suggestion

-Use btree for watch cache storage to serve LIST more efficiently. Can be disabled via BtreeWatchCache feature flag.
+Adopted a new implementation of watch caches for **list** verbs, using a btree data structure. The new implementation is active by default; you can opt out by disabling the `BtreeWatchCache` feature gate.

In docs we write HTTP-level verbs (eg POST) in all-uppercase, and API-level logical verbs (eg create) in bold lower case.

serathius · 2024-11-02T10:08:19Z

/cc @wojtek-t @deads2k @jpbetz

wojtek-t · 2024-11-04T08:16:01Z

staging/src/k8s.io/apiserver/pkg/features/kube_features.go

@@ -96,6 +96,12 @@ const (
 	// This feature is currently PRE-ALPHA and MUST NOT be enabled outside of integration tests.
 	TestOnlyCBORServingAndStorage featuregate.Feature = "TestOnlyCBORServingAndStorage"

+	// owner: @serathius
+	// beta: v1.32


We no longer add information about stages here, as this is now easy to get from VersionedSpec below.

But please add a link to kep.

There is no KEP, got agreement from api machinery leads to move forward with btree without KEP until we want to extend caching to more API surface.

wojtek-t · 2024-11-04T08:16:43Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/store.go

 	"k8s.io/client-go/tools/cache"
 )

+const (
+	// btreeDegree defines the degree of btree storage.
+	// Determined based on running BenchmarkStore, higher values bring diminishing benefits for LIST, while increasing cost of updates.


How the difference looks between e.g. 16 and 32 then?
Can you provide a bit more detail on how exactly this value was determined?

I selected 32 based during original PoC for serving exact requests from watch cache. If I remember correctly the benchmark showed that snapshotting btree gets more and more costly as degree, and 32 was a nice balance between them. It also nicely matched the degree of btree used by etcd.

Tried to repeat the tests assuming only the current state of the code however, as we don't snapshot btree there no negative factor of using the higher degree. Increasing the degree just brings diminishing improvements.

│ btree-8.txt │ btree-16.txt │ btree-32.txt │ btree-64.txt │ │ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │ StoreListCreate/RV=NotOlderThan-12 238.2µ ± 3% 231.5µ ± 3% -2.82% (p=0.015 n=10) 224.1µ ± 4% -5.92% (p=0.001 n=10) 221.1µ ± 2% -7.16% (p=0.000 n=10) StoreList/RV=NotOlderThan-12 679.3µ ± 2% 682.3µ ± 1% ~ (p=0.579 n=10) 682.3µ ± 2% ~ (p=0.796 n=10) 677.1µ ± 2% ~ (p=0.247 n=10) StoreList/Paginate/RV=NotOlderThan-12 16.47m ± 1% 16.51m ± 1% ~ (p=0.853 n=10) 16.55m ± 1% ~ (p=0.052 n=10) 16.43m ± 1% ~ (p=0.393 n=10) StoreList/Namespace/RV=NotOlderThan-12 973.0µ ± 5% 978.2µ ± 4% ~ (p=0.631 n=10) 976.1µ ± 3% ~ (p=1.000 n=10) 959.4µ ± 3% ~ (p=0.436 n=10) geomean 1.269m 1.264m -0.42% 1.254m -1.22% 1.239m -2.34% │ btree-8.txt │ btree-16.txt │ btree-32.txt │ btree-64.txt │ │ B/op │ B/op vs base │ B/op vs base │ B/op vs base │ StoreListCreate/RV=NotOlderThan-12 239.6Ki ± 7% 241.1Ki ± 6% ~ (p=0.971 n=10) 242.6Ki ± 3% ~ (p=0.436 n=10) 241.3Ki ± 3% ~ (p=0.912 n=10) StoreList/RV=NotOlderThan-12 6.007Mi ± 0% 6.007Mi ± 0% ~ (p=0.724 n=10) 6.007Mi ± 0% ~ (p=0.739 n=10) 6.007Mi ± 0% ~ (p=0.839 n=10) StoreList/Paginate/RV=NotOlderThan-12 64.19Mi ± 0% 64.19Mi ± 0% ~ (p=0.971 n=10) 64.20Mi ± 0% ~ (p=0.123 n=10) 64.19Mi ± 0% ~ (p=0.796 n=10) StoreList/Namespace/RV=NotOlderThan-12 6.523Mi ± 1% 6.523Mi ± 1% ~ (p=0.796 n=10) 6.524Mi ± 1% ~ (p=0.093 n=10) 6.523Mi ± 1% ~ (p=0.481 n=10) geomean 4.925Mi 4.933Mi +0.16% 4.941Mi +0.31% 4.934Mi +0.18% │ btree-8.txt │ btree-16.txt │ btree-32.txt │ btree-64.txt │ │ allocs/op │ allocs/op vs base │ allocs/op vs base │ allocs/op vs base │ StoreListCreate/RV=NotOlderThan-12 559.0 ± 0% 559.0 ± 0% ~ (p=1.000 n=10) 559.0 ± 0% ~ (p=0.635 n=10) 558.5 ± 0% -0.09% (p=0.029 n=10) StoreList/RV=NotOlderThan-12 87.50 ± 1% 87.50 ± 1% ~ (p=1.000 n=10) 87.00 ± 1% ~ (p=0.350 n=10) 87.00 ± 1% ~ (p=1.000 n=10) StoreList/Paginate/RV=NotOlderThan-12 492.9k ± 0% 492.7k ± 0% -0.04% (p=0.023 n=10) 492.9k ± 0% ~ (p=0.542 n=10) 492.7k ± 0% -0.05% (p=0.000 n=10) StoreList/Namespace/RV=NotOlderThan-12 6.214k ± 3% 6.213k ± 3% ~ (p=0.869 n=10) 6.219k ± 2% ~ (p=0.100 n=10) 6.210k ± 3% ~ (p=0.445 n=10) geomean 3.499k 3.498k -0.01% 3.494k -0.12% 3.492k -0.19%

What would you preferred way of documenting this?

I would first want to understand how it was even chosen.
The way I look at it, I see a diminishing return even when going to 16 and 32, not just 64.
Why do you claim 32 as a sweet spot?

This is now my only remaining question in this PR

Repeated the tests

│ 2 │ 4 │ 8 │ 16 │ 32 │ 64 │ 128 │ │ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │ sec/op vs base │ sec/op vs base │ sec/op vs base │ StoreCreateList/RV=NotOlderThan-24 473.0µ ± 11% 430.1µ ± 9% -9.08% (p=0.005 n=10) 427.9µ ± 6% -9.54% (p=0.002 n=10) 403.9µ ± 8% -14.62% (p=0.000 n=10) 401.0µ ± 4% -15.22% (p=0.000 n=10) 408.0µ ± 4% -13.75% (p=0.000 n=10) 385.9µ ± 4% -18.42% (p=0.000 n=10) StoreCreateList/RV=ExactMatch-24 604.7µ ± 4% 596.7µ ± 8% ~ (p=0.529 n=10) 604.6µ ± 4% ~ (p=0.971 n=10) 601.1µ ± 4% ~ (p=0.853 n=10) 611.0µ ± 6% ~ (p=0.105 n=10) 598.2µ ± 5% ~ (p=0.579 n=10) 608.2µ ± 3% ~ (p=0.796 n=10) StoreList/List=All/Paginate=False/RV=Empty-24 729.1µ ± 5% 692.9µ ± 3% -4.96% (p=0.002 n=10) 693.7µ ± 3% -4.86% (p=0.000 n=10) 688.3µ ± 1% -5.59% (p=0.000 n=10) 690.4µ ± 5% -5.31% (p=0.002 n=10) 689.7µ ± 2% -5.40% (p=0.000 n=10) 687.8µ ± 3% -5.67% (p=0.000 n=10) StoreList/List=All/Paginate=True/RV=Empty-24 19.51m ± 2% 19.84m ± 2% ~ (p=0.105 n=10) 19.89m ± 3% ~ (p=0.190 n=10) 19.64m ± 4% ~ (p=0.853 n=10) 19.34m ± 4% ~ (p=0.481 n=10) 20.22m ± 4% +3.66% (p=0.007 n=10) 19.58m ± 4% ~ (p=0.912 n=10) StoreList/List=Namespace/Paginate=False/RV=Empty-24 1.672m ± 4% 1.635m ± 2% ~ (p=0.247 n=10) 1.673m ± 5% ~ (p=0.631 n=10) 1.657m ± 2% ~ (p=0.971 n=10) 1.656m ± 4% ~ (p=0.739 n=10) 1.678m ± 2% ~ (p=0.631 n=10) 1.718m ± 8% ~ (p=0.105 n=10) geomean 1.467m 1.420m -3.24% 1.430m -2.58% 1.403m -4.38% 1.402m -4.46% 1.417m -3.44% 1.403m -4.41% │ 2 │ 4 │ 8 │ 16 │ 32 │ 64 │ 128 │ │ B/op │ B/op vs base │ B/op vs base │ B/op vs base │ B/op vs base │ B/op vs base │ B/op vs base │ StoreCreateList/RV=NotOlderThan-24 98.58Ki ± 11% 101.33Ki ± 13% ~ (p=0.280 n=10) 99.80Ki ± 26% ~ (p=0.353 n=10) 109.63Ki ± 9% ~ (p=0.075 n=10) 112.56Ki ± 6% +14.18% (p=0.007 n=10) 114.41Ki ± 10% +16.05% (p=0.003 n=10) 115.06Ki ± 12% +16.72% (p=0.011 n=10) StoreCreateList/RV=ExactMatch-24 117.1Ki ± 0% 117.5Ki ± 0% ~ (p=0.218 n=10) 116.9Ki ± 0% ~ (p=0.052 n=10) 117.3Ki ± 0% ~ (p=0.353 n=10) 116.9Ki ± 0% ~ (p=0.075 n=10) 117.0Ki ± 0% ~ (p=0.436 n=10) 117.0Ki ± 0% ~ (p=0.280 n=10) StoreList/List=All/Paginate=False/RV=Empty-24 6.023Mi ± 0% 6.024Mi ± 0% +0.01% (p=0.037 n=10) 6.024Mi ± 0% ~ (p=0.493 n=10) 6.024Mi ± 0% +0.01% (p=0.035 n=10) 6.024Mi ± 0% ~ (p=0.247 n=10) 6.024Mi ± 0% ~ (p=0.247 n=10) 6.024Mi ± 0% ~ (p=0.315 n=10) StoreList/List=All/Paginate=True/RV=Empty-24 64.22Mi ± 0% 64.21Mi ± 0% ~ (p=0.075 n=10) 64.23Mi ± 0% ~ (p=0.280 n=10) 64.21Mi ± 0% -0.02% (p=0.002 n=10) 64.22Mi ± 0% ~ (p=0.579 n=10) 64.22Mi ± 0% ~ (p=0.971 n=10) 64.22Mi ± 0% ~ (p=1.000 n=10) StoreList/List=Namespace/Paginate=False/RV=Empty-24 8.177Mi ± 0% 8.178Mi ± 0% ~ (p=0.579 n=10) 8.177Mi ± 0% ~ (p=0.971 n=10) 8.179Mi ± 0% ~ (p=0.579 n=10) 8.178Mi ± 0% ~ (p=0.739 n=10) 8.179Mi ± 0% ~ (p=0.315 n=10) 8.176Mi ± 0% ~ (p=0.247 n=10) geomean 2.034Mi 2.047Mi +0.61% 2.039Mi +0.22% 2.079Mi +2.19% 2.088Mi +2.66% 2.095Mi +3.01% 2.098Mi +3.12% │ 2 │ 4 │ 8 │ 16 │ 32 │ 64 │ 128 │ │ allocs/op │ allocs/op vs base │ allocs/op vs base │ allocs/op vs base │ allocs/op vs base │ allocs/op vs base │ allocs/op vs base │ StoreCreateList/RV=NotOlderThan-24 560.0 ± 0% 558.0 ± 0% -0.36% (p=0.000 n=10) 557.0 ± 0% -0.54% (p=0.000 n=10) 558.0 ± 0% -0.36% (p=0.000 n=10) 557.0 ± 0% -0.54% (p=0.000 n=10) 557.0 ± 0% -0.54% (p=0.000 n=10) 557.0 ± 0% -0.54% (p=0.000 n=10) StoreCreateList/RV=ExactMatch-24 871.0 ± 0% 870.0 ± 0% -0.11% (p=0.038 n=10) 870.0 ± 0% -0.11% (p=0.004 n=10) 870.0 ± 0% -0.11% (p=0.005 n=10) 869.0 ± 0% -0.23% (p=0.000 n=10) 870.0 ± 0% -0.11% (p=0.001 n=10) 870.0 ± 0% -0.11% (p=0.000 n=10) StoreList/List=All/Paginate=False/RV=Empty-24 351.0 ± 3% 358.0 ± 1% +1.99% (p=0.034 n=10) 352.5 ± 3% ~ (p=0.589 n=10) 358.5 ± 1% +2.14% (p=0.022 n=10) 356.5 ± 3% ~ (p=0.208 n=10) 355.0 ± 3% ~ (p=0.224 n=10) 355.0 ± 3% ~ (p=0.183 n=10) StoreList/List=All/Paginate=True/RV=Empty-24 494.4k ± 0% 494.4k ± 0% ~ (p=0.424 n=10) 494.6k ± 0% +0.06% (p=0.000 n=10) 492.7k ± 0% -0.34% (p=0.000 n=10) 494.5k ± 0% +0.02% (p=0.009 n=10) 493.0k ± 0% -0.28% (p=0.000 n=10) 494.4k ± 0% ~ (p=0.424 n=10) StoreList/List=Namespace/Paginate=False/RV=Empty-24 32.43k ± 0% 32.44k ± 0% ~ (p=0.579 n=10) 32.43k ± 0% ~ (p=0.971 n=10) 32.45k ± 0% ~ (p=0.517 n=10) 32.44k ± 0% ~ (p=0.670 n=10) 32.46k ± 0% ~ (p=0.256 n=10) 32.41k ± 0% ~ (p=0.247 n=10) geomean 4.872k 4.887k +0.31% 4.870k -0.03% 4.885k +0.28% 4.880k +0.17% 4.875k +0.06% 4.876k +0.08%

Benchmark point to both 16, 32 and 128 to provide the best result. I think 128 is some kind of anomaly due to lucky memory aligment and think is too large. There is no much difference between 16 and 32. I leaned towards 32 as it's the same degree as etcd uses.

I didn't know etcd uses 32 - it just sounds a bit high to me.
Based on the results above, I would personally choose 16, but I don't have super strong opinion here.

Let's just document that in the PR description then (copy the benchmark results and rationale why we choose the value).

https://github.com/etcd-io/etcd/blob/bd8896364a1d9adb554b538232004e5b2d6c850a/server/storage/mvcc/index.go#L47

staging/src/k8s.io/apiserver/pkg/storage/cacher/store.go

staging/src/k8s.io/apiserver/pkg/storage/cacher/watch_cache.go

electsim/main.go

staging/src/k8s.io/apiserver/pkg/storage/cacher/store.go

Can be disabled via BtreeWatchCache feature flag.

wojtek-t · 2024-11-05T15:39:07Z

/lgtm
/approve

k8s-ci-robot · 2024-11-05T15:39:14Z

LGTM label has been added.

Git tree hash: 12a2d62ba5e7b4a32da21ee5ed81f0ffe1236bb6

k8s-ci-robot · 2024-11-05T15:39:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: serathius, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/features/OWNERS~~ [wojtek-t]
~~staging/src/k8s.io/apiserver/pkg/features/OWNERS~~ [wojtek-t]
~~staging/src/k8s.io/apiserver/pkg/storage/OWNERS~~ [wojtek-t]
~~test/featuregates_linter/test_data/OWNERS~~ [wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

serathius · 2024-11-18T16:33:09Z

cc @MadhavJivrajani @deads2k @jpbetz @wojtek-t

We got confirmation from scalability tests about the performance improvement

dims · 2024-11-18T16:55:09Z

@serathius Very Very Cool! thanks @MadhavJivrajani and folks!

MadhavJivrajani · 2024-11-19T02:12:42Z

This is AMAZING!
Thank you so much! ❤️

k8s-ci-robot requested review from ingvagabund and jpbetz October 29, 2024 15:18

dims reviewed Oct 29, 2024

View reviewed changes

serathius force-pushed the watchcache-btree-2 branch from 083858f to 5b6f39c Compare October 29, 2024 19:11

serathius force-pushed the watchcache-btree-2 branch from 5b6f39c to 4b80c95 Compare October 29, 2024 19:14

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 29, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 29, 2024

k8s-ci-robot requested review from deads2k and wojtek-t November 2, 2024 10:08

wojtek-t reviewed Nov 4, 2024

View reviewed changes

wojtek-t self-assigned this Nov 4, 2024

serathius force-pushed the watchcache-btree-2 branch 5 times, most recently from ab8859d to 7a8de88 Compare November 5, 2024 11:21

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 5, 2024

wojtek-t reviewed Nov 5, 2024

View reviewed changes

electsim/main.go Outdated Show resolved Hide resolved

staging/src/k8s.io/apiserver/pkg/storage/cacher/store.go Show resolved Hide resolved

Use btree for watch cache storage to serve LIST more efficiently

5ea427e

Can be disabled via BtreeWatchCache feature flag.

serathius force-pushed the watchcache-btree-2 branch from 7a8de88 to 5ea427e Compare November 5, 2024 15:28

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 5, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2024

k8s-ci-robot merged commit 19d6337 into kubernetes:master Nov 5, 2024
16 checks passed

k8s-ci-robot added this to the v1.32 milestone Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use btree for watch cache storage to serve LIST more efficiently #128415

Use btree for watch cache storage to serve LIST more efficiently #128415

serathius commented Oct 29, 2024 •

edited

Loading

serathius commented Oct 29, 2024

dims Oct 29, 2024

serathius Oct 29, 2024

dims Oct 29, 2024

serathius commented Oct 29, 2024

fedebongio commented Oct 29, 2024

mengqiy commented Oct 30, 2024

sftim commented Oct 31, 2024

serathius commented Nov 2, 2024

wojtek-t Nov 4, 2024

serathius Nov 4, 2024

wojtek-t Nov 4, 2024

serathius Nov 4, 2024 •

edited

Loading

wojtek-t Nov 4, 2024

wojtek-t Nov 5, 2024

serathius Nov 5, 2024

serathius Nov 5, 2024

wojtek-t Nov 5, 2024

serathius Nov 5, 2024

wojtek-t commented Nov 5, 2024

k8s-ci-robot commented Nov 5, 2024

k8s-ci-robot commented Nov 5, 2024

serathius commented Nov 18, 2024

dims commented Nov 18, 2024

MadhavJivrajani commented Nov 19, 2024

Use btree for watch cache storage to serve LIST more efficiently #128415

Use btree for watch cache storage to serve LIST more efficiently #128415

Conversation

serathius commented Oct 29, 2024 • edited Loading

serathius commented Oct 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serathius commented Oct 29, 2024

fedebongio commented Oct 29, 2024

mengqiy commented Oct 30, 2024

sftim commented Oct 31, 2024

serathius commented Nov 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serathius Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t commented Nov 5, 2024

k8s-ci-robot commented Nov 5, 2024

k8s-ci-robot commented Nov 5, 2024

serathius commented Nov 18, 2024

dims commented Nov 18, 2024

MadhavJivrajani commented Nov 19, 2024

serathius commented Oct 29, 2024 •

edited

Loading

serathius Nov 4, 2024 •

edited

Loading