Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use btree for watch cache storage to serve LIST more efficiently #128415

Merged
merged 1 commit into from
Nov 5, 2024

Conversation

serathius
Copy link
Contributor

@serathius serathius commented Oct 29, 2024

Adopted a new implementation of watch caches for **list** verbs, using a btree data structure. The new implementation is active by default; you can opt out by disabling the `BtreeWatchCache` feature gate.

Follow up from #126754

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 29, 2024
@serathius
Copy link
Contributor Author

/sig api-machinery
/kind feature

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. kind/feature Categorizes issue or PR as related to a new feature. area/apiserver and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Oct 29, 2024
func newStoreIndexer(indexers *cache.Indexers) storeIndexer {
return cache.NewIndexer(storeElementKey, storeElementIndexers(indexers))
if utilfeature.DefaultFeatureGate.Enabled(features.BtreeWatchCache) {
return newThreadedBtreeStoreIndexer(storeElementIndexers(indexers), 32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's highlight 32 as a constant please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOne

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you! especially for adding more notes on how you got that number

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. area/cloudprovider area/dependency Issues or PRs related to dependency changes sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 29, 2024
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 29, 2024
@serathius
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-100-performance

@fedebongio
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 29, 2024
@mengqiy
Copy link
Member

mengqiy commented Oct 30, 2024

Great to see this new feature! Thanks @serathius.
I wonder if there are any new metrics that have been added. I don't see any in #126754

@sftim
Copy link
Contributor

sftim commented Oct 31, 2024

Changelog suggestion

-Use btree for watch cache storage to serve LIST more efficiently. Can be disabled via BtreeWatchCache feature flag.
+Adopted a new implementation of watch caches for **list** verbs, using a btree data structure. The new implementation is active by default; you can opt out by disabling the `BtreeWatchCache` feature gate.

In docs we write HTTP-level verbs (eg POST) in all-uppercase, and API-level logical verbs (eg create) in bold lower case.

@serathius
Copy link
Contributor Author

/cc @wojtek-t @deads2k @jpbetz

@@ -96,6 +96,12 @@ const (
// This feature is currently PRE-ALPHA and MUST NOT be enabled outside of integration tests.
TestOnlyCBORServingAndStorage featuregate.Feature = "TestOnlyCBORServingAndStorage"

// owner: @serathius
// beta: v1.32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer add information about stages here, as this is now easy to get from VersionedSpec below.

But please add a link to kep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no KEP, got agreement from api machinery leads to move forward with btree without KEP until we want to extend caching to more API surface.

"k8s.io/client-go/tools/cache"
)

const (
// btreeDegree defines the degree of btree storage.
// Determined based on running BenchmarkStore, higher values bring diminishing benefits for LIST, while increasing cost of updates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How the difference looks between e.g. 16 and 32 then?
Can you provide a bit more detail on how exactly this value was determined?

Copy link
Contributor Author

@serathius serathius Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I selected 32 based during original PoC for serving exact requests from watch cache. If I remember correctly the benchmark showed that snapshotting btree gets more and more costly as degree, and 32 was a nice balance between them. It also nicely matched the degree of btree used by etcd.

Tried to repeat the tests assuming only the current state of the code however, as we don't snapshot btree there no negative factor of using the higher degree. Increasing the degree just brings diminishing improvements.

                                       │ btree-8.txt │            btree-16.txt            │            btree-32.txt            │            btree-64.txt            │
                                       │   sec/op    │   sec/op     vs base               │   sec/op     vs base               │   sec/op     vs base               │
StoreListCreate/RV=NotOlderThan-12       238.2µ ± 3%   231.5µ ± 3%  -2.82% (p=0.015 n=10)   224.1µ ± 4%  -5.92% (p=0.001 n=10)   221.1µ ± 2%  -7.16% (p=0.000 n=10)
StoreList/RV=NotOlderThan-12             679.3µ ± 2%   682.3µ ± 1%       ~ (p=0.579 n=10)   682.3µ ± 2%       ~ (p=0.796 n=10)   677.1µ ± 2%       ~ (p=0.247 n=10)
StoreList/Paginate/RV=NotOlderThan-12    16.47m ± 1%   16.51m ± 1%       ~ (p=0.853 n=10)   16.55m ± 1%       ~ (p=0.052 n=10)   16.43m ± 1%       ~ (p=0.393 n=10)
StoreList/Namespace/RV=NotOlderThan-12   973.0µ ± 5%   978.2µ ± 4%       ~ (p=0.631 n=10)   976.1µ ± 3%       ~ (p=1.000 n=10)   959.4µ ± 3%       ~ (p=0.436 n=10)
geomean                                  1.269m        1.264m       -0.42%                  1.254m       -1.22%                  1.239m       -2.34%

                                       │ btree-8.txt  │            btree-16.txt             │            btree-32.txt             │            btree-64.txt             │
                                       │     B/op     │     B/op      vs base               │     B/op      vs base               │     B/op      vs base               │
StoreListCreate/RV=NotOlderThan-12       239.6Ki ± 7%   241.1Ki ± 6%       ~ (p=0.971 n=10)   242.6Ki ± 3%       ~ (p=0.436 n=10)   241.3Ki ± 3%       ~ (p=0.912 n=10)
StoreList/RV=NotOlderThan-12             6.007Mi ± 0%   6.007Mi ± 0%       ~ (p=0.724 n=10)   6.007Mi ± 0%       ~ (p=0.739 n=10)   6.007Mi ± 0%       ~ (p=0.839 n=10)
StoreList/Paginate/RV=NotOlderThan-12    64.19Mi ± 0%   64.19Mi ± 0%       ~ (p=0.971 n=10)   64.20Mi ± 0%       ~ (p=0.123 n=10)   64.19Mi ± 0%       ~ (p=0.796 n=10)
StoreList/Namespace/RV=NotOlderThan-12   6.523Mi ± 1%   6.523Mi ± 1%       ~ (p=0.796 n=10)   6.524Mi ± 1%       ~ (p=0.093 n=10)   6.523Mi ± 1%       ~ (p=0.481 n=10)
geomean                                  4.925Mi        4.933Mi       +0.16%                  4.941Mi       +0.31%                  4.934Mi       +0.18%

                                       │ btree-8.txt │            btree-16.txt            │            btree-32.txt            │            btree-64.txt            │
                                       │  allocs/op  │  allocs/op   vs base               │  allocs/op   vs base               │  allocs/op   vs base               │
StoreListCreate/RV=NotOlderThan-12        559.0 ± 0%    559.0 ± 0%       ~ (p=1.000 n=10)    559.0 ± 0%       ~ (p=0.635 n=10)    558.5 ± 0%  -0.09% (p=0.029 n=10)
StoreList/RV=NotOlderThan-12              87.50 ± 1%    87.50 ± 1%       ~ (p=1.000 n=10)    87.00 ± 1%       ~ (p=0.350 n=10)    87.00 ± 1%       ~ (p=1.000 n=10)
StoreList/Paginate/RV=NotOlderThan-12    492.9k ± 0%   492.7k ± 0%  -0.04% (p=0.023 n=10)   492.9k ± 0%       ~ (p=0.542 n=10)   492.7k ± 0%  -0.05% (p=0.000 n=10)
StoreList/Namespace/RV=NotOlderThan-12   6.214k ± 3%   6.213k ± 3%       ~ (p=0.869 n=10)   6.219k ± 2%       ~ (p=0.100 n=10)   6.210k ± 3%       ~ (p=0.445 n=10)
geomean                                  3.499k        3.498k       -0.01%                  3.494k       -0.12%                  3.492k       -0.19%

What would you preferred way of documenting this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would first want to understand how it was even chosen.
The way I look at it, I see a diminishing return even when going to 16 and 32, not just 64.
Why do you claim 32 as a sweet spot?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now my only remaining question in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeated the tests

                                                    │      2       │                 4                  │                 8                  │                 16                  │                 32                  │                 64                  │                 128                 │
                                                    │    sec/op    │   sec/op     vs base               │   sec/op     vs base               │   sec/op     vs base                │   sec/op     vs base                │   sec/op     vs base                │   sec/op     vs base                │
StoreCreateList/RV=NotOlderThan-24                    473.0µ ± 11%   430.1µ ± 9%  -9.08% (p=0.005 n=10)   427.9µ ± 6%  -9.54% (p=0.002 n=10)   403.9µ ± 8%  -14.62% (p=0.000 n=10)   401.0µ ± 4%  -15.22% (p=0.000 n=10)   408.0µ ± 4%  -13.75% (p=0.000 n=10)   385.9µ ± 4%  -18.42% (p=0.000 n=10)
StoreCreateList/RV=ExactMatch-24                      604.7µ ±  4%   596.7µ ± 8%       ~ (p=0.529 n=10)   604.6µ ± 4%       ~ (p=0.971 n=10)   601.1µ ± 4%        ~ (p=0.853 n=10)   611.0µ ± 6%        ~ (p=0.105 n=10)   598.2µ ± 5%        ~ (p=0.579 n=10)   608.2µ ± 3%        ~ (p=0.796 n=10)
StoreList/List=All/Paginate=False/RV=Empty-24         729.1µ ±  5%   692.9µ ± 3%  -4.96% (p=0.002 n=10)   693.7µ ± 3%  -4.86% (p=0.000 n=10)   688.3µ ± 1%   -5.59% (p=0.000 n=10)   690.4µ ± 5%   -5.31% (p=0.002 n=10)   689.7µ ± 2%   -5.40% (p=0.000 n=10)   687.8µ ± 3%   -5.67% (p=0.000 n=10)
StoreList/List=All/Paginate=True/RV=Empty-24          19.51m ±  2%   19.84m ± 2%       ~ (p=0.105 n=10)   19.89m ± 3%       ~ (p=0.190 n=10)   19.64m ± 4%        ~ (p=0.853 n=10)   19.34m ± 4%        ~ (p=0.481 n=10)   20.22m ± 4%   +3.66% (p=0.007 n=10)   19.58m ± 4%        ~ (p=0.912 n=10)
StoreList/List=Namespace/Paginate=False/RV=Empty-24   1.672m ±  4%   1.635m ± 2%       ~ (p=0.247 n=10)   1.673m ± 5%       ~ (p=0.631 n=10)   1.657m ± 2%        ~ (p=0.971 n=10)   1.656m ± 4%        ~ (p=0.739 n=10)   1.678m ± 2%        ~ (p=0.631 n=10)   1.718m ± 8%        ~ (p=0.105 n=10)
geomean                                               1.467m         1.420m       -3.24%                  1.430m       -2.58%                  1.403m        -4.38%                  1.402m        -4.46%                  1.417m        -3.44%                  1.403m        -4.41%

                                                    │       2       │                   4                   │                  8                   │                  16                  │                  32                   │                   64                   │                  128                   │
                                                    │     B/op      │      B/op       vs base               │     B/op       vs base               │     B/op       vs base               │     B/op       vs base                │      B/op       vs base                │      B/op       vs base                │
StoreCreateList/RV=NotOlderThan-24                    98.58Ki ± 11%   101.33Ki ± 13%       ~ (p=0.280 n=10)   99.80Ki ± 26%       ~ (p=0.353 n=10)   109.63Ki ± 9%       ~ (p=0.075 n=10)   112.56Ki ± 6%  +14.18% (p=0.007 n=10)   114.41Ki ± 10%  +16.05% (p=0.003 n=10)   115.06Ki ± 12%  +16.72% (p=0.011 n=10)
StoreCreateList/RV=ExactMatch-24                      117.1Ki ±  0%    117.5Ki ±  0%       ~ (p=0.218 n=10)   116.9Ki ±  0%       ~ (p=0.052 n=10)    117.3Ki ± 0%       ~ (p=0.353 n=10)    116.9Ki ± 0%        ~ (p=0.075 n=10)    117.0Ki ±  0%        ~ (p=0.436 n=10)    117.0Ki ±  0%        ~ (p=0.280 n=10)
StoreList/List=All/Paginate=False/RV=Empty-24         6.023Mi ±  0%    6.024Mi ±  0%  +0.01% (p=0.037 n=10)   6.024Mi ±  0%       ~ (p=0.493 n=10)    6.024Mi ± 0%  +0.01% (p=0.035 n=10)    6.024Mi ± 0%        ~ (p=0.247 n=10)    6.024Mi ±  0%        ~ (p=0.247 n=10)    6.024Mi ±  0%        ~ (p=0.315 n=10)
StoreList/List=All/Paginate=True/RV=Empty-24          64.22Mi ±  0%    64.21Mi ±  0%       ~ (p=0.075 n=10)   64.23Mi ±  0%       ~ (p=0.280 n=10)    64.21Mi ± 0%  -0.02% (p=0.002 n=10)    64.22Mi ± 0%        ~ (p=0.579 n=10)    64.22Mi ±  0%        ~ (p=0.971 n=10)    64.22Mi ±  0%        ~ (p=1.000 n=10)
StoreList/List=Namespace/Paginate=False/RV=Empty-24   8.177Mi ±  0%    8.178Mi ±  0%       ~ (p=0.579 n=10)   8.177Mi ±  0%       ~ (p=0.971 n=10)    8.179Mi ± 0%       ~ (p=0.579 n=10)    8.178Mi ± 0%        ~ (p=0.739 n=10)    8.179Mi ±  0%        ~ (p=0.315 n=10)    8.176Mi ±  0%        ~ (p=0.247 n=10)
geomean                                               2.034Mi          2.047Mi        +0.61%                  2.039Mi        +0.22%                   2.079Mi       +2.19%                   2.088Mi        +2.66%                   2.095Mi         +3.01%                   2.098Mi         +3.12%

                                                    │      2      │                 4                  │                 8                  │                 16                 │                 32                 │                 64                 │                128                 │
                                                    │  allocs/op  │  allocs/op   vs base               │  allocs/op   vs base               │  allocs/op   vs base               │  allocs/op   vs base               │  allocs/op   vs base               │  allocs/op   vs base               │
StoreCreateList/RV=NotOlderThan-24                     560.0 ± 0%    558.0 ± 0%  -0.36% (p=0.000 n=10)    557.0 ± 0%  -0.54% (p=0.000 n=10)    558.0 ± 0%  -0.36% (p=0.000 n=10)    557.0 ± 0%  -0.54% (p=0.000 n=10)    557.0 ± 0%  -0.54% (p=0.000 n=10)    557.0 ± 0%  -0.54% (p=0.000 n=10)
StoreCreateList/RV=ExactMatch-24                       871.0 ± 0%    870.0 ± 0%  -0.11% (p=0.038 n=10)    870.0 ± 0%  -0.11% (p=0.004 n=10)    870.0 ± 0%  -0.11% (p=0.005 n=10)    869.0 ± 0%  -0.23% (p=0.000 n=10)    870.0 ± 0%  -0.11% (p=0.001 n=10)    870.0 ± 0%  -0.11% (p=0.000 n=10)
StoreList/List=All/Paginate=False/RV=Empty-24          351.0 ± 3%    358.0 ± 1%  +1.99% (p=0.034 n=10)    352.5 ± 3%       ~ (p=0.589 n=10)    358.5 ± 1%  +2.14% (p=0.022 n=10)    356.5 ± 3%       ~ (p=0.208 n=10)    355.0 ± 3%       ~ (p=0.224 n=10)    355.0 ± 3%       ~ (p=0.183 n=10)
StoreList/List=All/Paginate=True/RV=Empty-24          494.4k ± 0%   494.4k ± 0%       ~ (p=0.424 n=10)   494.6k ± 0%  +0.06% (p=0.000 n=10)   492.7k ± 0%  -0.34% (p=0.000 n=10)   494.5k ± 0%  +0.02% (p=0.009 n=10)   493.0k ± 0%  -0.28% (p=0.000 n=10)   494.4k ± 0%       ~ (p=0.424 n=10)
StoreList/List=Namespace/Paginate=False/RV=Empty-24   32.43k ± 0%   32.44k ± 0%       ~ (p=0.579 n=10)   32.43k ± 0%       ~ (p=0.971 n=10)   32.45k ± 0%       ~ (p=0.517 n=10)   32.44k ± 0%       ~ (p=0.670 n=10)   32.46k ± 0%       ~ (p=0.256 n=10)   32.41k ± 0%       ~ (p=0.247 n=10)
geomean                                               4.872k        4.887k       +0.31%                  4.870k       -0.03%                  4.885k       +0.28%                  4.880k       +0.17%                  4.875k       +0.06%                  4.876k       +0.08%

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark point to both 16, 32 and 128 to provide the best result. I think 128 is some kind of anomaly due to lucky memory aligment and think is too large. There is no much difference between 16 and 32. I leaned towards 32 as it's the same degree as etcd uses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know etcd uses 32 - it just sounds a bit high to me.
Based on the results above, I would personally choose 16, but I don't have super strong opinion here.

Let's just document that in the PR description then (copy the benchmark results and rationale why we choose the value).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

staging/src/k8s.io/apiserver/pkg/storage/cacher/store.go Outdated Show resolved Hide resolved
@wojtek-t wojtek-t self-assigned this Nov 4, 2024
@serathius serathius force-pushed the watchcache-btree-2 branch 5 times, most recently from ab8859d to 7a8de88 Compare November 5, 2024 11:21
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 5, 2024
Can be disabled via BtreeWatchCache feature flag.
@wojtek-t
Copy link
Member

wojtek-t commented Nov 5, 2024

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 5, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 12a2d62ba5e7b4a32da21ee5ed81f0ffe1236bb6

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: serathius, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2024
@k8s-ci-robot k8s-ci-robot merged commit 19d6337 into kubernetes:master Nov 5, 2024
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Nov 5, 2024
@serathius
Copy link
Contributor Author

cc @MadhavJivrajani @deads2k @jpbetz @wojtek-t

We got confirmation from scalability tests about the performance improvement

image

@dims
Copy link
Member

dims commented Nov 18, 2024

@serathius Very Very Cool! thanks @MadhavJivrajani and folks!

@MadhavJivrajani
Copy link
Contributor

This is AMAZING!
Thank you so much! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/cloudprovider area/dependency Issues or PRs related to dependency changes cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

8 participants