Skip to content

[BUG] Error with dask-expr using categorical dtype #17415

Open
@trivialfis

Description

This is the same issue as rapidsai/dask-cuda#1408 . Cross-posting here as it's more related to cuDF instead of dask-cuda.

The following snippet works with DASK_DATAFRAME__QUERY_PLANNING=FALSE but fails with dask-expr.

import dask
import dask.dataframe as dd
import dask_cuda
from dask_cuda import LocalCUDACluster
from distributed import Client


def main(client):
    print(dask_cuda.__version__)
    print(dask.__version__)
    df = dd.from_dict({"qid": [1, 2, 1, 0, 2]}, npartitions=3)
    df.qid.astype("category").cat.as_known().compute()


if __name__ == "__main__":
    with LocalCUDACluster() as cluster:
        with Client(cluster) as client:
            with dask.config.set(
                {"array.backend": "cupy", "dataframe.backend": "cudf"}
            ):
                main(client)
Traceback (most recent call last):
  File "/home/jiamingy/workspace/xgboost_dev/XGBoostUtils/dask-issues/as-cat.py", line 20, in <module>
    main(client)
  File "/home/jiamingy/workspace/xgboost_dev/XGBoostUtils/dask-issues/as-cat.py", line 11, in main
    df.qid.astype("category").cat.as_known().compute()
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiamingy/.anaconda/envs/xgboost_dev_125/lib/python3.11/site-packages/dask_expr/_categorical.py", line 83, in as_known
    return self.set_categories(categories.values)
                               ^^^^^^^^^^^^^^^^^
  File "/home/jiamingy/.anaconda/envs/xgboost_dev_125/lib/python3.11/site-packages/cudf/core/index.py", line 1636, in values
    return self._column.values
           ^^^^^^^^^^^^^^^^^^^
  File "/home/jiamingy/.anaconda/envs/xgboost_dev_125/lib/python3.11/site-packages/cudf/core/column/string.py", line 5873, in values
    raise TypeError("String Arrays is not yet implemented in cudf")
TypeError: String Arrays is not yet implemented in cudf

Environment overview (please complete the following information)

  • Environment location: Bare-metal

  • Method of cuDF install: conda

  • dask-cuda: 24.12.00a12

  • dask: 2024.10.0

  • Python: 3.11.10

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions