Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: group-by with the same root name and different output names raises #11612

Open
MarcoGorelli opened this issue Dec 18, 2024 · 0 comments
Open
Labels
dask-expr enhancement Improve existing functionality or make things work better

Comments

@MarcoGorelli
Copy link

MarcoGorelli commented Dec 18, 2024

Describe the issue:

Minimal Complete Verifiable Example:

check this out

In [1]: import dask.dataframe as dd

In [2]: import pandas as pd

In [3]: df = pd.DataFrame({'a': [1,1,2], 'b': [4,5,6]})

In [4]: df.groupby('a').agg(c=('b', 'mean'), d=('b', 'mean'))
Out[4]: 
     c    d
a          
1  4.5  4.5
2  6.0  6.0

In [5]: dd.from_pandas(df).groupby('a').agg(c=('b', 'mean'), d=('b', 'mean'))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 dd.from_pandas(df).groupby('a').agg(c=('b', 'mean'), d=('b', 'mean'))

File ~/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask_expr/_groupby.py:1955, in GroupBy.agg(self, *args, **kwargs)
   1954 def agg(self, *args, **kwargs):
-> 1955     return self.aggregate(*args, **kwargs)

File ~/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask_expr/_groupby.py:1934, in GroupBy.aggregate(self, arg, split_every, split_out, shuffle_method, **kwargs)
   1931 if arg == "size":
   1932     return self.size()
-> 1934 result = new_collection(
   1935     GroupbyAggregation(
   1936         self.obj.expr,
   1937         arg,
   1938         self.observed,
   1939         self.dropna,
   1940         split_every,
   1941         split_out,
   1942         self.sort,
   1943         shuffle_method,
   1944         self._slice,
   1945         *self.by,
   1946     )
   1947 )
   1948 if relabeling and result is not None:
   1949     if order is not None:

File ~/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask_expr/_collection.py:4835, in new_collection(expr)
   4833 def new_collection(expr):
   4834     """Create new collection from an expr"""
-> 4835     meta = expr._meta
   4836     expr._name  # Ensure backend is imported
   4837     return get_collection_type(meta)(expr)

File ~/.local/share/uv/python/cpython-3.12.6-linux-x86_64-gnu/lib/python3.12/functools.py:993, in cached_property.__get__(self, instance, owner)
    991 val = cache.get(self.attrname, _NOT_FOUND)
    992 if val is _NOT_FOUND:
--> 993     val = self.func(instance)
    994     try:
    995         cache[self.attrname] = val

File ~/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask_expr/_groupby.py:439, in GroupbyAggregation._meta(self)
    437 @functools.cached_property
    438 def _meta(self):
--> 439     return self._lower()._meta

File ~/.local/share/uv/python/cpython-3.12.6-linux-x86_64-gnu/lib/python3.12/functools.py:993, in cached_property.__get__(self, instance, owner)
    991 val = cache.get(self.attrname, _NOT_FOUND)
    992 if val is _NOT_FOUND:
--> 993     val = self.func(instance)
    994     try:
    995         cache[self.attrname] = val

File ~/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask_expr/_reductions.py:440, in ApplyConcatApply._meta(self)
    438 @functools.cached_property
    439 def _meta(self):
--> 440     meta = self._meta_chunk
    441     aggregate = self.aggregate or (lambda x: x)
    442     if self.combine:

File ~/.local/share/uv/python/cpython-3.12.6-linux-x86_64-gnu/lib/python3.12/functools.py:993, in cached_property.__get__(self, instance, owner)
    991 val = cache.get(self.attrname, _NOT_FOUND)
    992 if val is _NOT_FOUND:
--> 993     val = self.func(instance)
    994     try:
    995         cache[self.attrname] = val

File ~/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask_expr/_groupby.py:213, in GroupByApplyConcatApply._meta_chunk(self)
    210 @functools.cached_property
    211 def _meta_chunk(self):
    212     meta = meta_nonempty(self.frame._meta)
--> 213     return self.chunk(meta, *self._by_meta, **self.chunk_kwargs)

File ~/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask_expr/_groupby.py:530, in DecomposableGroupbyAggregation.chunk_kwargs(self)
    527 @property
    528 def chunk_kwargs(self) -> dict:
    529     return {
--> 530         "funcs": self.agg_args["chunk_funcs"],
    531         "sort": self.sort,
    532         **_as_dict("observed", self.observed),
    533         **_as_dict("dropna", self.dropna),
    534     }

File ~/.local/share/uv/python/cpython-3.12.6-linux-x86_64-gnu/lib/python3.12/functools.py:993, in cached_property.__get__(self, instance, owner)
    991 val = cache.get(self.attrname, _NOT_FOUND)
    992 if val is _NOT_FOUND:
--> 993     val = self.func(instance)
    994     try:
    995         cache[self.attrname] = val

File ~/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask_expr/_groupby.py:411, in GroupbyAggregationBase.agg_args(self)
    408 @functools.cached_property
    409 def agg_args(self):
    410     keys = ["chunk_funcs", "aggregate_funcs", "finalizers"]
--> 411     return dict(zip(keys, _build_agg_args(self.spec)))

File ~/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask/dataframe/groupby.py:875, in _build_agg_args(spec)
    873 for funcs in by_name.values():
    874     if len(funcs) != 1:
--> 875         raise ValueError(f"conflicting aggregation functions: {funcs}")
    877 chunks = {}
    878 aggs = {}

ValueError: conflicting aggregation functions: [('mean', 'b'), ('mean', 'b')]

Anything else we need to know?:

Spotted in Narwhals (because we are so awesome 😎 )

Environment:

  • Dask version: 2024.12.1
  • Python version: 3.12
  • Operating System: linux
  • Install method (conda, pip, source): pip
@github-actions github-actions bot added the needs triage Needs a response from a contributor label Dec 18, 2024
@phofl phofl added enhancement Improve existing functionality or make things work better dask-expr and removed needs triage Needs a response from a contributor labels Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dask-expr enhancement Improve existing functionality or make things work better
Projects
None yet
Development

No branches or pull requests

2 participants