Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New implementation of watch cache using btree data structure. #126754

Merged
merged 1 commit into from
Oct 29, 2024

Conversation

serathius
Copy link
Contributor

@serathius serathius commented Aug 17, 2024

Continuing work of #108392

Started from taking the exact implementation by @MadhavJivrajani, however updated the btree factor to 32.

Using btree without any changes already is faster than existing ThreadedSafeStore. There are still many improvements that could be implemented.

Btree is awesome for watch cache due to:

  • Support key ranges
  • Returns ordered results

Also etcd uses btree to serve ranges too.

Performance improvement:

  • 25% improvement in operations per second
  • 15% less bytes allocated
goos: linux
goarch: amd64
pkg: k8s.io/apiserver/pkg/storage/cacher
cpu: AMD Ryzen 5 7600X 6-Core Processor
                                                  │  base.txt   │            btree.txt             │
                                                  │   sec/op    │   sec/op     vs base                │
StoreListCreate/RV=NotOlderThan-12                  698.8µ ± 4%   227.9µ ± 2%  -67.39% (p=0.000 n=10)
StoreListCreate/RV=ExactMatch-12                    356.4µ ± 2%   358.7µ ± 2%        ~ (p=0.631 n=10)
StoreList/RV=Empty-12                               982.5µ ± 6%   722.9µ ± 4%  -26.43% (p=0.000 n=10)
StoreList/RV=NotOlderThan-12                        972.3µ ± 2%   700.7µ ± 2%  -27.93% (p=0.000 n=10)
StoreList/RV=MatchExact-12                          16.55m ± 3%   16.38m ± 2%        ~ (p=0.218 n=10)
StoreList/Paginate/RV=Empty-12                      18.55m ± 1%   18.49m ± 1%        ~ (p=0.353 n=10)
StoreList/Paginate/RV=NotOlderThan-12               18.39m ± 3%   18.45m ± 1%        ~ (p=0.971 n=10)
StoreList/Paginate/RV=MatchExact-12                 18.28m ± 1%   18.47m ± 1%   +1.03% (p=0.003 n=10)
StoreList/NodeIndexed/RV=Empty-12                   1.791m ± 3%   1.796m ± 2%        ~ (p=0.796 n=10)
StoreList/NodeIndexed/RV=NotOlderThan-12            1.234m ± 4%   1.250m ± 3%        ~ (p=0.796 n=10)
StoreList/NodeIndexed/RV=MatchExact-12               7.460 ± 1%    7.455 ± 1%        ~ (p=0.796 n=10)
StoreList/NodeIndexed/Paginate/RV=Empty-12          70.41m ± 1%   64.66m ± 4%   -8.17% (p=0.000 n=10)
StoreList/NodeIndexed/Paginate/RV=NotOlderThan-12   69.42m ± 1%   63.59m ± 1%   -8.39% (p=0.000 n=10)
StoreList/NodeIndexed/Paginate/RV=MatchExact-12      8.077 ± 1%    8.088 ± 1%        ~ (p=0.165 n=10)
StoreList/Namespace/RV=Empty-12                     6.421m ± 1%   1.675m ± 1%  -73.91% (p=0.000 n=10)
StoreList/Namespace/RV=NotOlderThan-12              5.670m ± 2%   1.069m ± 1%  -81.14% (p=0.000 n=10)
StoreList/Namespace/RV=MatchExact-12                15.84m ± 2%   16.33m ± 3%   +3.09% (p=0.000 n=10)
geomean                                             13.05m        9.789m       -24.98%

                                                  │   base.txt    │             btree.txt             │
                                                  │     B/op      │     B/op      vs base                │
StoreListCreate/RV=NotOlderThan-12                   293.7Ki ± 4%   233.1Ki ± 4%  -20.65% (p=0.000 n=10)
StoreListCreate/RV=ExactMatch-12                     115.0Ki ± 0%   114.9Ki ± 0%   -0.09% (p=0.000 n=10)
StoreList/RV=Empty-12                                6.179Mi ± 0%   6.023Mi ± 0%   -2.53% (p=0.000 n=10)
StoreList/RV=NotOlderThan-12                         6.163Mi ± 0%   6.007Mi ± 0%   -2.53% (p=0.000 n=10)
StoreList/RV=MatchExact-12                           60.32Mi ± 0%   60.29Mi ± 0%        ~ (p=0.529 n=10)
StoreList/Paginate/RV=Empty-12                       64.29Mi ± 0%   64.05Mi ± 0%   -0.37% (p=0.000 n=10)
StoreList/Paginate/RV=NotOlderThan-12                64.27Mi ± 0%   64.04Mi ± 0%   -0.35% (p=0.000 n=10)
StoreList/Paginate/RV=MatchExact-12                  63.64Mi ± 0%   63.56Mi ± 0%   -0.12% (p=0.000 n=10)
StoreList/NodeIndexed/RV=Empty-12                    7.928Mi ± 1%   7.929Mi ± 1%   +0.01% (p=0.035 n=10)
StoreList/NodeIndexed/RV=NotOlderThan-12             6.328Mi ± 0%   6.328Mi ± 0%        ~ (p=0.739 n=10)
StoreList/NodeIndexed/RV=MatchExact-12               5.785Gi ± 0%   5.782Gi ± 0%        ~ (p=0.315 n=10)
StoreList/NodeIndexed/Paginate/RV=Empty-12           284.0Mi ± 0%   257.0Mi ± 0%   -9.51% (p=0.000 n=10)
StoreList/NodeIndexed/Paginate/RV=NotOlderThan-12    282.2Mi ± 0%   255.3Mi ± 0%   -9.55% (p=0.000 n=10)
StoreList/NodeIndexed/Paginate/RV=MatchExact-12      6.073Gi ± 0%   6.048Gi ± 0%   -0.42% (p=0.000 n=10)
StoreList/Namespace/RV=Empty-12                     23.745Mi ± 0%   8.123Mi ± 0%  -65.79% (p=0.000 n=10)
StoreList/Namespace/RV=NotOlderThan-12              22.143Mi ± 0%   6.521Mi ± 0%  -70.55% (p=0.000 n=10)
StoreList/Namespace/RV=MatchExact-12                 63.39Mi ± 0%   63.32Mi ± 0%   -0.11% (p=0.000 n=10)
geomean                                              33.76Mi        28.64Mi       -15.15%

                                                  │  base.txt   │            btree.txt            │
                                                  │  allocs/op  │  allocs/op   vs base               │
StoreListCreate/RV=NotOlderThan-12                   561.0 ± 0%    559.0 ± 0%  -0.36% (p=0.000 n=10)
StoreListCreate/RV=ExactMatch-12                     874.5 ± 0%    875.0 ± 0%       ~ (p=0.407 n=10)
StoreList/RV=Empty-12                                346.5 ± 1%    345.0 ± 2%       ~ (p=0.236 n=10)
StoreList/RV=NotOlderThan-12                         88.00 ± 1%    87.00 ± 1%  -1.14% (p=0.002 n=10)
StoreList/RV=MatchExact-12                          460.6k ± 0%   456.8k ± 0%  -0.83% (p=0.000 n=10)
StoreList/Paginate/RV=Empty-12                      490.6k ± 0%   486.8k ± 0%  -0.77% (p=0.000 n=10)
StoreList/Paginate/RV=NotOlderThan-12               490.4k ± 0%   486.6k ± 0%  -0.78% (p=0.000 n=10)
StoreList/Paginate/RV=MatchExact-12                 492.9k ± 0%   489.1k ± 0%  -0.77% (p=0.000 n=10)
StoreList/NodeIndexed/RV=Empty-12                   32.08k ± 0%   32.10k ± 0%       ~ (p=0.061 n=10)
StoreList/NodeIndexed/RV=NotOlderThan-12            6.614k ± 1%   6.614k ± 1%       ~ (p=0.642 n=10)
StoreList/NodeIndexed/RV=MatchExact-12              46.96M ± 0%   46.58M ± 0%  -0.81% (p=0.000 n=10)
StoreList/NodeIndexed/Paginate/RV=Empty-12          2.147M ± 0%   1.948M ± 0%  -9.28% (p=0.000 n=10)
StoreList/NodeIndexed/Paginate/RV=NotOlderThan-12   2.119M ± 0%   1.920M ± 0%  -9.38% (p=0.000 n=10)
StoreList/NodeIndexed/Paginate/RV=MatchExact-12     47.93M ± 0%   47.46M ± 0%  -0.97% (p=0.000 n=10)
StoreList/Namespace/RV=Empty-12                     31.84k ± 0%   31.72k ± 0%  -0.39% (p=0.000 n=10)
StoreList/Namespace/RV=NotOlderThan-12              6.314k ± 0%   6.218k ± 0%  -1.53% (p=0.000 n=10)
StoreList/Namespace/RV=MatchExact-12                490.1k ± 0%   486.4k ± 0%  -0.76% (p=0.000 n=10)
geomean                                             78.82k        77.48k       -1.69%

/kind feature

NONE

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 17, 2024
@k8s-ci-robot k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 17, 2024
@xuzhenglun
Copy link
Contributor

👍 really great work!

there is another #126753 seems duplicated? :)

@serathius serathius force-pushed the watchcache-btree branch 2 times, most recently from d086627 to 8a96c58 Compare August 17, 2024 15:09
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/cloudprovider area/dependency Issues or PRs related to dependency changes sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 17, 2024
@serathius
Copy link
Contributor Author

/retest

@serathius serathius force-pushed the watchcache-btree branch 3 times, most recently from 48e1292 to 795c90f Compare August 18, 2024 07:01
@serathius
Copy link
Contributor Author

Oh, found that original PoC had very inefficient implementation of Get (https://github.com/kubernetes/kubernetes/pull/108392/files#diff-7dbc4a86989f847d364c09b2370266542ccbb72edc66d0f346559771b1445e56R200-R211) that scanned whole keyspace. Fixed

@serathius serathius force-pushed the watchcache-btree branch 3 times, most recently from a3b8124 to 3c430e1 Compare August 18, 2024 15:01
@serathius serathius force-pushed the watchcache-btree branch 2 times, most recently from a2d0739 to 67f3f12 Compare October 24, 2024 17:27
@serathius
Copy link
Contributor Author

Just storage implementation using btree and tests. Removed integration to watch cache to avoid too many approvals on feature flags.

@serathius
Copy link
Contributor Author

Got to agreement about the indexer code. PTAL @jpbetz @deads2k @wojtek-t

@@ -53,6 +55,12 @@ type storeElement struct {
Fields fields.Set
}

func (t *storeElement) Less(than btree.Item) bool {
return t.Key < than.(*storeElement).Key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - this is risky. In theory not every btree.Item has to be storeElement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cacher yes, that's one of the assumption that allows us to simplify the code.

items = append(items, i.(interface{}))
return true
})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove empty line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same in few more similar places below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


if limit == 0 {
s.tree.AscendGreaterOrEqual(&storeElement{Key: continueKey}, func(i btree.Item) bool {
elementKey := i.(*storeElement).Key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we handle i not being storeElement?

Copy link
Contributor Author

@serathius serathius Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btree store validates all inserted items are storeElement

@wojtek-t
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 29, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 06b7f5e845b72661a9035b289a0ab22bc2425868

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: serathius, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 29, 2024
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 29, 2024
@k8s-ci-robot k8s-ci-robot requested a review from wojtek-t October 29, 2024 12:13
@serathius
Copy link
Contributor Author

Fixed typo caught by verify.

@wojtek-t
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 29, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 2b01e9e4c7f1726b96bee7c10dbc0ba07066a988

@dims
Copy link
Member

dims commented Oct 29, 2024

@serathius can we reflect what you mentioned in an earlier comment in the release notes? It currently says Use btree data structure for watch cache but should it be New implementation of watch cache using btree data structure. Implementation is not enabled yet. (or something equivalent?).

PS: probably need to do the same for the PR title too.

@k8s-ci-robot k8s-ci-robot merged commit c83250d into kubernetes:master Oct 29, 2024
13 of 14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Oct 29, 2024
@serathius serathius changed the title Reimplement watch cache storage with btree New implementation of watch cache using btree data structure. Oct 29, 2024
@serathius
Copy link
Contributor Author

@dims Done

@dims
Copy link
Member

dims commented Oct 29, 2024

thanks @serathius ! will avoid unnecessary questions :)

@sftim
Copy link
Contributor

sftim commented Oct 31, 2024

Changelog suggestion

-New implementation of watch cache using btree data structure. Implementation is not enabled yet.

If it's not enabled, I wouldn't changelog it.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/cloudprovider area/dependency Issues or PRs related to dependency changes cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/etcd Categorizes an issue or PR as relevant to SIG Etcd. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.