-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: perf worker #18007
base: main
Are you sure you want to change the base?
test: perf worker #18007
Conversation
Interested in the results of this! |
a29e9a3
to
7729594
Compare
Test scenario:
Initial results: Observations:
Thoughts:
juju/internal/database/txn/transaction.go Lines 233 to 236 in 29bc8b5
|
7729594
to
3c59e3b
Compare
How hard can we actually hit dqlite from inside of juju. Let's find out!
3c59e3b
to
68628ab
Compare
#18012 The atomic context can be pooled to prevent the creation of objects on the heap. There was originally a TODO about doing this, but until the state base was finalized it was left to do until later. There are a few services already using the atomic setup, so it feels like we should do this before it's forgotten. There should be no changes to the testing, this is just an implementation detail and should be transparent to the tests. This was observed when testing #18007 <!-- The PR title should match: <type>(optional <scope>): <description>. Please also ensure all commits in this PR comply with our conventional commits specification: https://docs.google.com/document/d/1SYUo9G7qZ_jdoVXpUVamS5VCgHmtZ0QA-wZxKoMS-C0 --> <!-- Why this change is needed and what it does. --> ## Checklist <!-- If an item is not applicable, use `~strikethrough~`. --> - [x] Code style: imports ordered, good names, simple structure, etc - [x] Comments saying why design decisions were made - [x] Go unit tests, with comments saying what you're testing ## QA steps Existing tests should pass. Testing adding units to ensure no regressions have been performed is a good test also. ```sh $ juju bootstrap lxd test $ juju add-model default $ juju deploy ubuntu $ juju remove-application ubuntu ``` ## Links **Jira card:** https://warthogs.atlassian.net/browse/JUJU-6625
Test scenario:
Initial results: Observations:
Thoughts:
|
I ran another test without the semaphore and wasn't able to push past 800 transactions per second. I'll explore if the semaphore should be tuned higher, or lower. To get high numbers using sqlite, it is advised to only have a dedicated read connection. I'm unsure if that would translate to dqlite, but we could potentially spike a prototype to see if it's feasible. |
To compare apples to apples, we need a baseline, this starts by bringing over the perf-worker from 4.0. See: juju#18007
To compare apples to apples, we need a baseline, this starts by bringing over the perf-worker from 4.0. See: juju#18007
See 3.6 results #18131 (comment) Takeaway: It looks like mongo can hit a higher number of read queries, even if we take into account queries vs queries inside of a transaction. The latter is unlikely to have 6 queries per transaction (more likely 2-3 queries per transaction) to get to 8000+ queries per second. I've got another round of testing to perform, which might give us the apples vs apples that we're looking for, but we might have to revisit the semaphore and retry solution to try and optimize it in the future. |
It takes a lot to wire raw metrics all the way through, and the tests suites are broken because manifolds in test suites expect things that aren't there. But it does show up in `juju_metrics`.
chore: expose metrics to the perf worker
If you use a normal prometheus.Counter, they have to have a unique namespace, or you get a collision of the same Collector being registered 2x. So we move to a CounterVec and include the model-uuid, which gets us around this. It is probably better to have it, anyway.
The issue really was that we were registering the same (Name,Subsystem) multiple times. The only way around this is to push up the registration of the perf worker higher, and have it create a CounterVec. We can re-use ForModel. I don't know that it is ideal, as it means the agent/engine knows about additional metrics that are specific to 1 worker. But we're just hacking it together here.
One very positive note. All of this was |
I ran the same setup, microk8s + cos-lite and 3.6 and 4.0 relating to that prometheus. Mongo stayed at 4000 req/s and dqlite was the low 300 req/s. I added more models (to 40 models) and it just highlighted that mongo can perform more queries, whilst dqlite was hovering around the same amount. It's clear that the rate-limiting is just hurting us. It's clear we can't run without the semaphore (performance dropped), we need to work out a better strategy for this. I'm unsure what. ObservationsAdding more models whilst underload is a lot slower for dqlite. This is obvious, once you realize we're being rate-limited. |
I've experimented with creating a poor man's hash ring in a branch. Long story short; In the dbaccessor worker, I create 3 dqlite apps instead of one. Each dqlite app is created with its own transaction runner (each transaction runner has its own semaphore) and then depending on the hash of the namespace it's sent to the appropriate dqlite app. The controller namespace (global) is special as it is always at app 0. Each dqlite app is currently bound to an abstract domain socket ( The distribution of the namespace isn't perfect and it's a tendency to favor one index over the other (see below). We could potentially have a round-robin instead, which would give a fairer distribution. There is no consideration at all over scaling down the dqlite apps if a model goes away in the code, although I do have ideas on how to do this.
The great thing about this change was that all of the code changes were located in two places; the dbaccessor worker and the bootstrap dqlite code. Results:Note: this wasn't testing 3.6 at the same time, although in the prior runs, the tests weren't taxing the system, so keep that in mind. The number of dqlite applications was 3 and was hash indexed between the 3, with bias on index 1. The number of iterations rose above 1000, peaking at 1337. ObservationsWe're not scaling linearly as that should put us in the 900 ballpark (300 req/s * 3 dqlite apps). I wonder if we're just not on the controller application, we scale much better. Reads vs writes is coming into play. |
Further to the prior comment, I created 6 dqlite applications and we did not scale superlinearly. We top out at 1736 req/s. I then decided to add more models and the writes just tanked all the reads, which you can see in the screen shot below, the iteration count just tanks. ObservationsIf the reads are performing well, then a series of writes can dramatically ruin the read performance. A potential solution to this, is to split writes and reads, which is possible in sqlite, but I doubt is possible for dqlite, but is worth exploring. |
Cutting the semaphore to half (24 to 12) didn't reveal any difference between the req/s. The number ended up being 1326 req/s. ObservationsThere seems to be a fine balance between backing off on the semaphore and overwhelming dqlite. Removing the semaphore or doubling it dramatically makes things worse, reducing it by half didn't have any real difference. I'm wondering if setting it to 2, would see any changes, or have we just maxed the throughput of dqlite? |
The actual perf worker had 6 options plus a default, but was using % 6, which would never have returned a 6. It also did 20 steps per step worker every second, which is a nice round number, but means that it didn't sample evenly the actual steps. 21 cleanly divides 7, so is a fairer sampling and test.
As a point, Simon asked "so how much can we throw at 3.6". So I scaled up to 200 models. Which at most runs 40k steps / second. I then took it a step further and went to 300 models, which should cap out at 60k steps/second.
So about 18 CPU cores fully engaged. (I have 12 E cores, 8 P cores, for a total of 28 threads.) |
How hard can we actually hit dqlite from inside of juju. Let's find out!
The premise is simple, can we afford to run multiple read transactions with wild abandon?
Before this change a spike was done to test the premise, but from the outside of Juju, going via the CLI and the apiserver. The problem we hit pretty quickly was that we required a fmutex to access the local yaml files found in
~/.local/share/juju
. Hacking that out of the way allowed us to reach 800/tps (transactions per second). After doing the spike, the thought was to remove the overhead from the CLI, WebSockets, and API server/facades to get a true figure.This proposal introduces a model worker. The model worker can acquire the service factory for that given model and then run several read-only queries. This then removes the potential of accidental writes into the database whilst testing this setup.
The perf worker creates a new step worker every 30 seconds. The step worker will then run a series (currently 20 iterations), every second. This will then build up over time, currently no upper limit (this will likely change).
The step function will then call a series of calls against the controller and model database.
To add more work and transactions, add more models.
Checklist
QA steps
Download the bundle and overlay https://gist.github.com/SimonRichardson/0283807cdc026c4ecfe557891958f5b6
Alternatively, deploy the COS stack to then monitor the controller.
Links
Jira card: https://warthogs.atlassian.net/browse/JUJU-6625