Skip to content

Commit

Permalink
Reorganized anticipated API changes in README
Browse files Browse the repository at this point in the history
  • Loading branch information
shoyer committed Apr 20, 2014
1 parent ec3ec65 commit 79f9343
Showing 1 changed file with 15 additions and 23 deletions.
38 changes: 15 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,36 +180,28 @@ Don't forget to `git fetch` regular updates!

Aspects of the API that we currently intend to change:

- ~~Integer indexing on `Datasets` with 1-dimensional variables (via
`indexed_by` or `labeled_by`) will turn those variables into 0-dimensional
(scalar) variables instead of dropping them.~~
- ~~The primitive `XArray` object will be removed from the public API.
`DataArray` will be used instead in all public interfaces.~~
- The constructor for `DataArray` objects will change, so that it is possible
to create new `DataArray` objects without putting them into a `Dataset`
first.
- ~~We currently check `var.attributes['coordinates']` for figuring out which
variables to select with `Dataset.select`. This will probably be removed:
we don't want users to rely on attribute metadata that is not necessarily
maintained by array operations.~~
- The constructor for `DataArray` objects will probably change, so that it
is possible to create new `DataArray` objects without putting them into a
`Dataset` first.
- Array reduction methods like `mean` may change to NA skipping versions
(like pandas).
- Array indexing will be made lazy, instead of immediately creating an
ndarray. This will make it easier to subsample from very large Datasets
using the `indexed_by` and `labeled_by` methods. We might need to add a
special method to allow for explicitly caching values in memory.
- We will automatically align `DataArray` objects when doing math. Most
likely, we will use an inner join (unlike pandas's outer join), because an
outer join can result in ridiculous memory blow-ups when working with high
dimensional arrays.

Once we finalize these aspects of the API and improve the documentation, we
- Future versions of xray will add better support for working with datasets
too big to fit into memory, probably by wrapping libraries like
[blaze][blaze]/[blz][blz] or [biggus][biggus]. More immediately:
- Array indexing will be made lazy, instead of immediately creating an
ndarray. This will make it easier to subsample from very large Datasets
incrementally using the `indexed` and `labeled` methods. We might need to
add a special method to allow for explicitly caching values in memory.
- We intend to support `Dataset` objects linked to NetCDF or HDF5 files on
disk to allow for incremental writing of data.

Once we get the API in a state we're comfortable with and improve the documentation, we
intend to release version 0.1. Our target is to do so before the xray talk on
May 3, 2014 at [PyData Silicon Valley][pydata]. Future versions of xray will
add better support for working with datasets too big to fit into memory,
probably by wrapping libraries like [blaze][blaze]/[blz][blz] or
[biggus][biggus]. At a minimum, we intend to support `Dataset` objects linked
to NetCDF or HDF5 files on disk to allow for incremental writing of data.
May 3, 2014 at [PyData Silicon Valley][pydata].

[pydata]: http://pydata.org/sv2014/
[blaze]: https://github.com/ContinuumIO/blaze/
Expand Down

0 comments on commit 79f9343

Please sign in to comment.