-
Notifications
You must be signed in to change notification settings - Fork 974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve trace client performance using concurrent resource load #5726
Conversation
A totally bad benchmarkThis only serves to show that increasing concurrency does matter. Some numbers on my laptop after hard-coding
As you can see at some point, increasing the concurrency doesn't make a difference and there are diminishing returns. |
389d7cc
to
b5d8a44
Compare
Interesting - lint is failing for go 1.22 int range usage but |
Seems to me that the linter error is spurious since the code doesn't compile with that change (given 1.21 in go.mod) |
go.mod's directive is only the minimum required version, what matters is the version of the go binary, which is set to 1.22.3 in ci. I have to go through it more thoroughly, but in the meantime thanks for your contribution! Just a breadcrumb: another possible optimization I left behind during the initial implementation was to use the cached discovery client, unfortunately wiring up all the usual Kubernetes flags wasn't so straightforward given that we aren't using pflag/cobra. That would probably reduce the need to fine tune the qps and burst settings. |
not able to compile locally even with go 1.22.3
|
🤔 you are right, I didn't realise what it was complaining about 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely curious what the performance would be in your 500+ resource scenario @gotwarlost after #5737 is merged. That looks to be fixing a big recent regression in performance...
Second commit fixes test structure and sorts children after load. Also sets QPS and burst but I will get rid of this with a rebase from master. |
Not sure why the E2E tests are failing - the failing test cases seem unrelated to the changes in this PR. |
hello, how can I help moving this PR forward? Please let me know. |
Hi, sorry, I'll make sure to have a look at it tomorrow. Could you rerun the benchmarks after the fix as @jbw976 was suggesting in the meantime? |
Benchmarks for 3 cases, before QPS change, after QPS change and current PR. Note that the QPS changes do not affect performance since access to resources are sequential with current head.
|
Nice @gotwarlost, thank you for running these benchmarks in your environment, that definitely shows an improvement when running with more concurrency! 💪 Interesting to see for you that b404cd0 had no effect, but your explanation makes sense for environments where the RTT for each serialized network request is significant enough. I wanted to take these changes for a test spin as well and I got different results for my environment that is just using a local kind cluster where the RTT network time from the TL;DR: Higher QPS/Burst (b404cd0) does make a significant difference for me in this local kind scenario, while higher concurrency doesn't make much difference. I still think we should proceed with your general approach of adding concurrency support, since it clearly makes an improvement for more real world environments 🎉 , but it is still interesting to note this behavior difference 🤓 To set up my perf testing scenario, I created a test repo with a
250
Test 914b83a (QPS and burst missing): 49 seconds
Test b404cd0 (QPS and burst settings restored): 11.8 seconds
Test this PR #5726 with new concurrency setting: Min: 11.08 seconds @ 8 concurrency, Max: 11.89 seconds @ 1 concurrency
I will take a full code review pass through the code changes as well, but I wanted to share those performance findings, as well as the reusable https://github.com/jbw976/xtrace-perf testing repo that could benefit others. Once again, I don't think these results indicate we shouldn't accept this change, as they are for a less important and less common local laptop cluster environment, with different network RTT characteristics. The concurrency support should definitely be beneficial for remote clusters 🙇♂️ |
Those are interesting findings and certainly way different from what I have. I've noticed in the past that the network round trip time massively affects performance. One a kind cluster where we run E2E tests, I had never noticed a performance problem (stuff loads in a few seconds) and it only when running the command on EKS clusters etc. with higher network latency that all these settings become material. My hypothesis is that resources are loading so fast in your kind cluster that the qps/ burst of the client becomes the bottleneck because it is executing many more k8s requests per second than it would otherwise do with network latency. I'll re-run my tests for the facade composition on a kind cluster and report here. I suspect the behavior will be closer to what you have than what I previously published. |
Exactly @gotwarlost, I 💯 believe that hypothesis as well! It makes logical sense 🙇♂️ |
Yes, I can reproduce results similar to yours for a kind cluster with ~480 resources
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a pass through the code today too @gotwarlost, thanks again for this contribution! Most of my comments are around helping future readers of this code understand it quickly, i.e. simplifying things and adding some more plain english comments to help people keep track of where they are within the nested concurrency/goroutines. Let me know what you think of these suggestions, thank you! 🙇♂️
fixes crossplane#5707 Add a loader with configurable concurrency to load resources in concurrent manner. The xrm client delegates to the loader for resource load and supports a functional option to set the concurrency. Add a `--concurrency` flag for the `crank beta trace` command and configure the xrm client appropriately. Signed-off-by: gotwarlost <kananthmail-github@yahoo.com>
* sorts children after concurrent load using the same algorithm that crossplane uses to sort resource refs * sets default QPS and burst when not set to the same values as a recent PR * updates test structure to match existing tests Signed-off-by: gotwarlost <kananthmail-github@yahoo.com>
Signed-off-by: gotwarlost <kananthmail-github@yahoo.com>
* remove context from the loader interface when not needed * change member name from l to rl * simplify tests to lose regexes * add concurrency docs * remove module global for channel capacity, pass in as parameter * use range over ints instead of for loops since linter complains otherwise * do not run goroutines when there are no child refs to add * add more docs for sorting logic and why it is needed Signed-off-by: gotwarlost <kananthmail-github@yahoo.com>
Thanks for the detailed PR review @jbw976 - I resolved all but one of your comments, the code is much better for it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome @gotwarlost, thank you very much for incorporating the feedback, I appreciate you taking the time to do that. I took it for another test spin today on a live cluster and verified all the items are being returned as expected with multiple runs, so it is all working reliably well for me. I also ran the unit tests with count=1000
and those passed perfectly too, so feeling pretty good about the correctness of this change.
Just one final question from me about the for loop variable gotcha - if you can confirm with your thinking that we are clear from that, then that would be good!
for _, ref := range refs { | ||
l.ch <- workItem{ | ||
parent: parent, | ||
child: ref, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this susceptible to the infamous go for loop variable gotcha? https://dev.to/kkentzo/the-golang-for-loop-gotcha-1n35
Looking a bit further, it appears this was fixed with go 1.22, which we appear to be using: https://github.com/crossplane/crossplane/blob/master/go.mod#L3
Do we completely not have to worry about this anymore? or is there any further steps we would need to take? should we continue to be defensive about this regardless, out of habit/tradition? what do you think? 🤔
https://go.dev/blog/loopvar-preview
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code also works on Go 1.21 BTW, since there are no pointers or goroutines involved that use the loop variable outside its intended scope. Also as you rightly said Go 1.22 doesn't have this issue any more.
I can put a ref := ref
in the loop if you think that is more consistent with the codebase, but I don't know that it is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, looking again more closely you are correct that this would work OK before go 1.22 also! A copy of the ref
variable here is sent through the channel, so ref
is never used outside of the scope of this loop 🙇♂️
No need to make a change here as far as I'm concerned!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for a great contribution that will help real world usage of crossplane beta trace
much more performant! A great addition for v1.17 🤩
Description of your changes
fixes #5707
Add a loader with configurable concurrency to load resources in concurrent manner. The xrm client delegates to the loader for resource load and supports a functional option to set the concurrency.
Add a
--concurrency
flag for thecrank beta trace
command and configure the xrm client appropriately.I have:
make reviewable
to ensure this PR is ready for review.- [ ] Added or updated e2e tests.- [ ] Linked a PR or a [docs tracking issue] to [document this change].- [ ] Addedbackport release-x.y
labels to auto-backport this PR.