-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unique keys runs slow with time #5965
Comments
I was looking at the code and saw that when entering task they are grouped by the prefix (that is splitter by
and had much better results but can I clean the stats somehow? |
e.g. async with Client(address='0.0.0.0:8786', asynchronous=True) as client:
ran = randint(0, 100)
f1 = client.submit(add, n, 1)
f2 = client.submit(add, n, ran)
f3 = client.submit(add, f1, f2)
res = await f3
assert res == ((n + 1) + (n + ran)), f"{res} != {((n + 1) + (n + ran))}" Do you have a specific reason why you are building the graph yourself? The submit/map API is much more user friendly. This will re-use keys since, by default, we assume functions to be pure, i.e. side effect free, deterministic and therefore cachable. If this is not true, use the keyword pure=False for all your submit calls an we will generate unique keys for you |
The tasks should be deleted from the client or from the scheduler? I used graph because I wanted to build the graph only once and reuse it each time with different parameters I had changes in my use case and now I'm building the DAG everytime, so I'll try to back to submits |
One thing to notice: looks like you're opening 20 different clients at once, then submitting the "same" (same keys) operation from each of them. That should work, it's just unusual. Could you share the results of your
feels related to #5960. We've noticed this exact behavior before (re-running the same operation is slower and slower each time) but never figured out why: #4987 (comment).
Graph will actually be a tiny bit more efficient, since it's one network call instead of 4. Using
In that case, giving them new keys every time would be the semantically correct thing to do.
There's no method named |
I'm generating a new key, but with the same prefix
I meant the |
continue the discussion from dask forum a few questions about the dashboard
All of my |
I have a DAG that I want to run in parallel multiple times, but each time with a different input params
Having the same key for all of the parallel runs isn't working, because it will only run it once with the first run params
My solution for that was to generate a guid as a key for each task
But this issue is that having guid is the key creates huge number of tasks, and the scheduler becomes slower with time
Trying to have a pool of keys solved the issue, but this isn’t the solution I want to have
Once the task finishes, I don't care about it anymore so if deleting the keys is an option and will solve the issue, it will be great
Environment:
The text was updated successfully, but these errors were encountered: