Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate legacy low level optimizers #11458

Open
fjetter opened this issue Oct 25, 2024 · 5 comments
Open

Deprecate legacy low level optimizers #11458

fjetter opened this issue Oct 25, 2024 · 5 comments
Labels
deprecation Something is being removed discussion Discussing a topic with no specific actions yet

Comments

@fjetter
Copy link
Member

fjetter commented Oct 25, 2024

We're offering some functionality for low level task graph rewrites/optimizations, see https://docs.dask.org/en/stable/optimize.html

I propose to deprecate and remove most, if not all, of this functionality. With #9969 most of this will break. Some/all of this could be migrated but I am not entirely convinced this is worth it.

Explicitly, I propose to remove

  • cull (keep internally)
  • fuse (keep internally, possibly replace with simpler task spec version)
  • fuse_linear (keep internally, possibly replace with simpler task spec version)
  • inline_functions (There is rudimentary support for this in task spec already)
  • functions_of
  • Rewrite Rules

(Re)Moving these things also means that magical kwargs to compute like fuse_keys, fast_functions, inline_functions_fast_functions would vanish.

For now, we'd likely keep the fuse functions but keep it pretty much internal. It is currently only executed for the array code so keeping it there in place still makes sense (for now) but I don't believe this is a valid function to expose.

I'm a bit on the fence regarding custom optimizers, e.g. the possibility to register an optimizer function with something like dask.config.set(array_optimize=my_optimizer), see also https://docs.dask.org/en/stable/optimize.html#customizing-optimization I suspect there are very few power users even capable of writing this kind of code. It will certainly break and I'm not entirely convinced this is worth restoring.

cc @mrocklin in case you have insights or an opinion on any of this

@github-actions github-actions bot added the needs triage Needs a response from a contributor label Oct 25, 2024
@fjetter fjetter added discussion Discussing a topic with no specific actions yet deprecation Something is being removed and removed needs triage Needs a response from a contributor labels Oct 25, 2024
@mrocklin
Copy link
Member

I'm generally ok with removing optimizers if we think they aren't useful for performance. I'd want to verify that things like cull really are handled broadly. You'd probably have a better sense for that then me though.

In general, I don't have conviction that this is a bad idea, so no blockers from me (but I can't say if it's a good idea with any confidence)

@mrocklin
Copy link
Member

Also, it's probably worth checking with a few of the downstream collections like xarray, rapids, geopandas

@fjetter
Copy link
Member Author

fjetter commented Nov 19, 2024

With #11525 the first couple low level optimizers are removed from the default optimization chain for arrays. Once we remove the legacy dataframe I believe most of the low level optimizers are dead

@Quasorglow
Copy link

We have a scenario where two delayed stages which we want to combine/fuse, so we were exploring how two levels can be fused together. If low level optimisers are going to be deprecated, what are the other ways of achieving similar optimisations like task fusion.

@phofl
Copy link
Collaborator

phofl commented Dec 16, 2024

We added an option for this a few releases ago:

dask.config.set({"optimization.fuse.delayed": True})

Could you try this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecation Something is being removed discussion Discussing a topic with no specific actions yet
Projects
None yet
Development

No branches or pull requests

4 participants