Skip to content

dask scalars don't preserve dtype #11637

Open
@MarcoGorelli

Description

Describe the issue:

Minimal Complete Verifiable Example:

In [2]: import pandas as pd

In [3]: import dask.dataframe as dd

In [4]: from datetime import date, datetime, timedelta

In [5]: dfpd = pd.DataFrame({'a': [date(2020,1,1)]}, dtype='date32[pyarrow]')

In [6]: df = dd.from_pandas(dfpd)

In [7]: df
Out[7]: 
Dask DataFrame Structure:
                                  a
npartitions=1                      
0              date32[day][pyarrow]
0                               ...
Dask Name: frompandas, 1 expression
Expr=df

In [8]: df.assign(b=df['a'][0])
Out[8]: 
Dask DataFrame Structure:
                                  a       b
npartitions=1                              
0              date32[day][pyarrow]  object
0                               ...     ...
Dask Name: assign, 4 expressions
Expr=Assign(frame=df)

Column 'b' now has dtype 'object', but I've have expected it to preserve the date32[day][pyarrow] dtype

Anything else we need to know?: spotted in Narwhals

Environment:

  • Dask version: 2024.10.0
  • Python version:
  • Operating System:
  • Install method (conda, pip, source):

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions