Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataframe.read_parquet crashed with DefaultAzureCredential cannot be deterministically hashed #11610

Open
seanslma opened this issue Dec 18, 2024 · 0 comments · May be fixed by #11666
Open

dataframe.read_parquet crashed with DefaultAzureCredential cannot be deterministically hashed #11610

seanslma opened this issue Dec 18, 2024 · 0 comments · May be fixed by #11666

Comments

@seanslma
Copy link

seanslma commented Dec 18, 2024

Describe the issue:
Dask 2024.2.1 version in python 3.9 works as expected.

Dask 2024.12.0 version in python 3.12 crashed with

  File "/home/user/conda-envs/dev-env/lib/python3.12/site-packages/dask/utils.py", line 772, in __call__
    return meth(arg, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/conda-envs/dev-env/lib/python3.12/site-packages/dask/tokenize.py", line 159, in normalize_seq
    return type(seq).__name__, _normalize_seq_func(seq)
                               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/conda-envs/dev-env/lib/python3.12/site-packages/dask/tokenize.py", line 152, in _normalize_seq_func
    return tuple(map(_inner_normalize_token, seq))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/conda-envs/dev-env/lib/python3.12/site-packages/dask/tokenize.py", line 146, in _inner_normalize_token
    return normalize_token(item)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/conda-envs/dev-env/lib/python3.12/site-packages/dask/utils.py", line 772, in __call__
    return meth(arg, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/conda-envs/dev-env/lib/python3.12/site-packages/dask/tokenize.py", line 210, in normalize_object
    _maybe_raise_nondeterministic(
  File "/home/user/conda-envs/dev-env/lib/python3.12/site-packages/dask/tokenize.py", line 89, in _maybe_raise_nondeterministic
    raise TokenizationError(msg)
dask.tokenize.TokenizationError: Object <azure.identity.aio._credentials.default.DefaultAzureCredential object at 0x7fb2dad44d40> cannot be deterministically hashed. See https://docs.dask.org/en/latest/custom-collections.html#implementing-deterministic-hashing for more information.

Note that in the following example if i replace storage_options by filesystem it works.

from adlfs.spec import AzureBlobFileSystem
filesystem = AzureBlobFileSystem(
    **storage_options,
)

Minimal Complete Verifiable Example:

import pyarrow as pa
import dask.dataframe as dd
from azure.identity.aio import DefaultAzureCredential

DEV_PA_SCHEMAS = pa.schema([
    ('dev_code', pa.string()),
    ('dev_value', pa.float64()),
])

storage_options = dict(
    account_name='my_azure_blob_storage_name',
    credential=DefaultAzureCredential(),
)

d = dd.read_parquet(
    [
        'az://my-container/2024-12-17/file1.parquet',
        'az://my-container/2024-12-17/file2.parquet',
    ],
    filters=None,
    index=False,
    columns=['dev_code'],
    engine='pyarrow',
    storage_options=storage_options,
    open_file_options=dict(precache_options=dict(method='parquet')),
    schema=DEV_PA_SCHEMAS,
)['dev_code'].unique().compute()

Anything else we need to know?:

Environment: Azure Kubernetes pod

  • Dask version: 2024.12.0
  • Python version: 3.12.8
  • Operating System: Ubuntu 22.04
  • Install method (conda, pip, source): conda
  • Pandas version: 2.2.3
  • Pyarrow version: 18.1.0
@github-actions github-actions bot added the needs triage Needs a response from a contributor label Dec 18, 2024
@phofl phofl added dask-expr and removed needs triage Needs a response from a contributor labels Dec 18, 2024
@phofl phofl linked a pull request Jan 13, 2025 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants