Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream the adhoc query results #2429

Merged
merged 6 commits into from
Sep 7, 2024
Merged

Stream the adhoc query results #2429

merged 6 commits into from
Sep 7, 2024

Conversation

gz
Copy link
Collaborator

@gz gz commented Sep 5, 2024

No description provided.

@gz gz requested a review from ryzhyk September 5, 2024 09:16
@gz gz force-pushed the streaming-yo branch 2 times, most recently from 22339bc to ee2153a Compare September 5, 2024 09:39
crates/adapters/src/server/mod.rs Show resolved Hide resolved
crates/pipeline-manager/src/api/pipeline.rs Show resolved Hide resolved
crates/pipeline-manager/src/runner.rs Show resolved Hide resolved
@gz gz force-pushed the streaming-yo branch 3 times, most recently from 5e8f5d2 to 1487edf Compare September 7, 2024 08:56
This prevents the manager from crashing on machines with many cores.

Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>
Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>
If an adhoc query is sent it will be handled in the same
tokio runtime that is running the webserver.

This raises the questions of how many cores the runtime should
have. Datafusion will need one task for each partition we're
reading from. And we have as many partitions as dbsp workers.
So maybe we should've just change HTTP_NUM_WORKERS from 4 to
`config.global.runtime_workers`. Except, this doesn't
quite help because if your query is very expensive like
`select * from table_with_100m`, all it's task run for
quite some time and starve out the HTTP request. So now
the UI no longer updates. Ok, so maybe it should
just be `workers*2`, but if workers is set near the amount
of CPU cores on the machine then we'll overprovision
the tokio runtime and that will slow things down too.
So for now we just set this to #cores on the machine by
not configuring it (this is the default).

Note aside, ideally we can just swap out the actix
runtime with `dbsp::runtime::TOKIO` which is what we use
everywhere else and this already uses the #cores
in the runtime to run the tasks.
However, this also doesn't work because the actix
API has a stupid bug in the interface.

I submitted a PR to fix this here:
actix/actix-net#599

So for now, we'll just have two tokio runtimes, with

Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>
Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>
Also add ability to abort request with Ctrl+C.

Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>
As opposed to sending them only once everything is computed,
which can take a long time.

Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>
@gz gz merged commit 847c4a1 into main Sep 7, 2024
5 checks passed
@gz gz deleted the streaming-yo branch September 7, 2024 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants