Description
Suggestion
We relatively regularly have users asking about subclassing DataArray
and Dataset
, and I know of at least a few cases where people have gone through with it. However we currently explicitly discourage doing this, on the basis that basically all operations will return a bare xarray object instead of the subclassed version, it's full of trip hazards, and we have the accessor interface to point people to instead.
However, while useful, the accessors aren't enough for some users, and I think we could probably do better. If we refactored internally we might be able to make it much easier to subclass.
Example to follow in Pandas
Pandas takes an interesting approach: while they also explicitly discourage subclassing, they still try to make it easier, and show you what you need to do in order for it to work.
They ask you to override some constructor properties with your own, and allow you to define your own original properties.
Potential complications
-
.construct_dataarray
andDataArray.__init__
are used a lot internally to reconstruct a DataArray fromdims
,coords
,data
etc. before returning the result of a method call. We would probably need to standardise this, before allowing users to override it. -
Pandas actually has multiple constructor properties you need to override:
_constructor
,_constructor_sliced
, and_constructor_expanddim
. What's the minimum set of similar constructors we would need? -
Blocking access to attributes - we current stop people from adding their own attributes quite aggressively, so that we can have attributes as an alias for variables and attrs, we would need to either relax this or better allow users to set a list of their own
_properties
which they want to register, similar to pandas. -
__slots__
- I think something funky can happen if you inherit from a class that defines__slots__
?
Documentation
I think if we do this we should also slightly refactor the relevant docs to make clear the distinction between 3 groups of people:
- Users - People who import and use xarray at the top-level with (ideally) no particular concern as to how it works. This is who the vast majority of the documentation is for.
- Developers - People who are actually improving and developing xarray upstream. This is who the Contributing to xarray page is for.
- Extenders - People who want to subclass, accessorize or wrap xarray objects, in order to do something more complicated. These people are probably writing a domain-specific library which will then bring in a new set of users. There maybe aren't as many of these people, but they are really important IMO. This is implicitly who the xarray internals page is aimed at, but it would be nice to make that distinction much more clear. It might also be nice to give them a guide as to "I want to achieve X, should I use wrapping/subclassing/accessors?"
@max-sixty you had some ideas about what would need to be done for this to work?