Skip to content

Commit

Permalink
API reorganization
Browse files Browse the repository at this point in the history
Renamed "XArray" back to "Variable" and a bunch of associated names. Also
renamed the "data" attribute to "values" to match pandas (closes pydata#97). Using
any of the old names should still work (for now) but raise a warning.
  • Loading branch information
shoyer committed Apr 13, 2014
1 parent 4713be2 commit 3753832
Show file tree
Hide file tree
Showing 22 changed files with 855 additions and 777 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,8 +183,8 @@ Aspects of the API that we currently intend to change:
- ~~Integer indexing on `Datasets` with 1-dimensional variables (via
`indexed_by` or `labeled_by`) will turn those variables into 0-dimensional
(scalar) variables instead of dropping them.~~
- The primitive `XArray` object will be removed from the public API.
`DataArray` will be used instead in all public interfaces.
- ~~The primitive `XArray` object will be removed from the public API.
`DataArray` will be used instead in all public interfaces.~~
- The constructor for `DataArray` objects will change, so that it is possible
to create new `DataArray` objects without putting them into a `Dataset`
first.
Expand Down
2 changes: 0 additions & 2 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -166,5 +166,3 @@ Top-level functions
:toctree: generated/

align
encode_cf_datetime
decode_cf_datetime
26 changes: 13 additions & 13 deletions doc/data-structures.rst
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
Data structures
===============

``xray``'s core data structures are the ``Dataset``, ``XArray`` and
``xray``'s core data structures are the ``Dataset``, ``Variable`` and
``DataArray``.

Dataset
-------

``Dataset`` is netcdf-like object consisting of **variables** (a dictionary of
XArray objects) and **attributes** (an ordered dictionary) which together form a
self-describing data set.
Variable objects) and **attributes** (an ordered dictionary) which together
form a self-describing data set.

XArray
------
Variable
--------

``XArray`` implements **xray's** basic extended array object. It supports the
``Variable`` implements **xray's** basic extended array object. It supports the
numpy ndarray interface, but is extended to support and use metadata. It
consists of:

Expand All @@ -25,28 +25,28 @@ consists of:
3. **attributes**: An ordered dictionary of additional metadata to associate
with this array.

The main functional difference between XArrays and numpy.ndarrays is that
numerical operations on XArrays implement array broadcasting by dimension
name. For example, adding an XArray with dimensions `('time',)` to another
XArray with dimensions `('space',)` results in a new XArray with dimensions
The main functional difference between Variables and numpy.ndarrays is that
numerical operations on Variables implement array broadcasting by dimension
name. For example, adding an Variable with dimensions `('time',)` to another
Variable with dimensions `('space',)` results in a new Variable with dimensions
`('time', 'space')`. Furthermore, numpy reduce operations like ``mean`` or
``sum`` are overwritten to take a "dimension" argument instead of an "axis".

XArrays are light-weight objects used as the building block for datasets.
Variables are light-weight objects used as the building block for datasets.
However, usually manipulating data in the form of a DataArray should be
preferred (see below), because they can use more complete metadata in the full
of other dataset variables.

DataArray
---------

``DataArray`` is a flexible hybrid of Dataset and XArray that attempts to
``DataArray`` is a flexible hybrid of Dataset and Variable that attempts to
provide the best of both in a single object. Under the covers, DataArrays
are simply pointers to a dataset (the ``dataset`` attribute) and the name of a
"focus variable" in the dataset (the ``focus`` attribute), which indicates to
which variable array operations should be applied.

DataArray objects implement the broadcasting rules of XArray objects, but
DataArray objects implement the broadcasting rules of Variable objects, but
also use and maintain coordinates (aka "indices"). This means you can do
intelligent (and fast!) label based indexing on DataArrays (via the
``.loc`` attribute), do flexibly split-apply-combine operations with
Expand Down
16 changes: 8 additions & 8 deletions test/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,14 @@ def requires_netCDF4(test):


class TestCase(unittest.TestCase):
def assertXArrayEqual(self, v1, v2):
self.assertTrue(utils.xarray_equal(v1, v2))
def assertVariableEqual(self, v1, v2):
self.assertTrue(utils.variable_equal(v1, v2))

def assertXArrayAllClose(self, v1, v2, rtol=1e-05, atol=1e-08):
self.assertTrue(utils.xarray_allclose(v1, v2, rtol=rtol, atol=atol))
def assertVariableAllClose(self, v1, v2, rtol=1e-05, atol=1e-08):
self.assertTrue(utils.variable_allclose(v1, v2, rtol=rtol, atol=atol))

def assertXArrayNotEqual(self, v1, v2):
self.assertFalse(utils.xarray_equal(v1, v2))
def assertVariableNotEqual(self, v1, v2):
self.assertFalse(utils.variable_equal(v1, v2))

def assertArrayEqual(self, a1, a2):
assert_array_equal(a1, a2)
Expand All @@ -56,15 +56,15 @@ def assertDatasetEqual(self, d1, d2):
for k in d1:
v1 = d1.variables[k]
v2 = d2.variables[k]
self.assertXArrayEqual(v1, v2)
self.assertVariableEqual(v1, v2)

def assertDatasetAllClose(self, d1, d2, rtol=1e-05, atol=1e-08):
self.assertTrue(utils.dict_equal(d1.attributes, d2.attributes))
self.assertEqual(sorted(d1.variables), sorted(d2.variables))
for k in d1:
v1 = d1.variables[k]
v2 = d2.variables[k]
self.assertXArrayAllClose(v1, v2, rtol=rtol, atol=atol)
self.assertVariableAllClose(v1, v2, rtol=rtol, atol=atol)

def assertDataArrayEqual(self, ar1, ar2):
self.assertEqual(ar1.name, ar2.name)
Expand Down
2 changes: 1 addition & 1 deletion test/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ def test_open_encodings(self):

actual = open_dataset(tmp_file)

self.assertXArrayEqual(actual['time'], expected['time'])
self.assertVariableEqual(actual['time'], expected['time'])
actual_encoding = {k: v for k, v in actual['time'].encoding.iteritems()
if k in expected['time'].encoding}
self.assertDictEqual(actual_encoding, expected['time'].encoding)
Expand Down
92 changes: 46 additions & 46 deletions test/test_data_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,19 @@
from copy import deepcopy
from textwrap import dedent

from xray import Dataset, DataArray, XArray, align
from xray import Dataset, DataArray, Variable, align
from . import TestCase, ReturnItem


class TestDataArray(TestCase):
def setUp(self):
self.x = np.random.random((10, 20))
self.v = XArray(['x', 'y'], self.x)
self.v = Variable(['x', 'y'], self.x)
self.ds = Dataset({'foo': self.v})
self.dv = DataArray(self.ds, 'foo')
self.dv = self.ds['foo']

def test_repr(self):
v = XArray(['time', 'x'], [[1, 2, 3], [4, 5, 6]], {'foo': 'bar'})
v = Variable(['time', 'x'], [[1, 2, 3], [4, 5, 6]], {'foo': 'bar'})
data_array = Dataset({'my_variable': v})['my_variable']
expected = dedent("""
<xray.DataArray 'my_variable' (time: 2, x: 3)>
Expand All @@ -28,13 +28,13 @@ def test_repr(self):
def test_properties(self):
self.assertIs(self.dv.dataset, self.ds)
self.assertEqual(self.dv.name, 'foo')
self.assertXArrayEqual(self.dv.variable, self.v)
self.assertArrayEqual(self.dv.data, self.v.data)
self.assertVariableEqual(self.dv.variable, self.v)
self.assertArrayEqual(self.dv.values, self.v.values)
for attr in ['dimensions', 'dtype', 'shape', 'size', 'ndim',
'attributes']:
self.assertEqual(getattr(self.dv, attr), getattr(self.v, attr))
self.assertEqual(len(self.dv), len(self.v))
self.assertXArrayEqual(self.dv, self.v)
self.assertVariableEqual(self.dv, self.v)
self.assertEqual(list(self.dv.coordinates), list(self.ds.coordinates))
for k, v in self.dv.coordinates.iteritems():
self.assertArrayEqual(v, self.ds.coordinates[k])
Expand All @@ -48,20 +48,20 @@ def test_items(self):
self.assertDataArrayEqual(self.dv, self.ds['foo'])
x = self.dv['x']
y = self.dv['y']
self.assertDataArrayEqual(DataArray(self.ds, 'x'), x)
self.assertDataArrayEqual(DataArray(self.ds, 'y'), y)
self.assertDataArrayEqual(self.ds['x'], x)
self.assertDataArrayEqual(self.ds['y'], y)
# integer indexing
I = ReturnItem()
for i in [I[:], I[...], I[x.data], I[x.variable], I[x], I[x, y],
I[x.data > -1], I[x.variable > -1], I[x > -1],
for i in [I[:], I[...], I[x.values], I[x.variable], I[x], I[x, y],
I[x.values > -1], I[x.variable > -1], I[x > -1],
I[x > -1, y > -1]]:
self.assertXArrayEqual(self.dv, self.dv[i])
self.assertVariableEqual(self.dv, self.dv[i])
for i in [I[0], I[:, 0], I[:3, :2],
I[x.data[:3]], I[x.variable[:3]], I[x[:3]], I[x[:3], y[:4]],
I[x.data > 3], I[x.variable > 3], I[x > 3], I[x > 3, y > 3]]:
self.assertXArrayEqual(self.v[i], self.dv[i])
I[x.values[:3]], I[x.variable[:3]], I[x[:3]], I[x[:3], y[:4]],
I[x.values > 3], I[x.variable > 3], I[x > 3], I[x > 3, y > 3]]:
self.assertVariableEqual(self.v[i], self.dv[i])
# make sure we always keep the array around, even if it's a scalar
self.assertXArrayEqual(self.dv[0, 0], self.dv.variable[0, 0])
self.assertVariableEqual(self.dv[0, 0], self.dv.variable[0, 0])
self.assertEqual(self.dv[0, 0].dataset,
Dataset({'foo': self.dv.variable[0, 0]}))

Expand All @@ -84,9 +84,9 @@ def test_loc(self):
self.assertDataArrayEqual(self.dv[1], self.dv.loc['b'])
self.assertDataArrayEqual(self.dv[:3], self.dv.loc[['a', 'b', 'c']])
self.assertDataArrayEqual(self.dv[:3, :4],
self.dv.loc[['a', 'b', 'c'], np.arange(4)])
self.dv.loc[['a', 'b', 'c'], np.arange(4)])
self.dv.loc['a':'j'] = 0
self.assertTrue(np.all(self.dv.data == 0))
self.assertTrue(np.all(self.dv.values == 0))

def test_rename(self):
renamed = self.dv.rename('bar')
Expand All @@ -105,14 +105,14 @@ def test_array_interface(self):
self.assertArrayEqual(np.asarray(self.dv), self.x)
# test patched in methods
self.assertArrayEqual(self.dv.take([2, 3]), self.v.take([2, 3]))
self.assertXArrayEqual(self.dv.argsort(), self.v.argsort())
self.assertXArrayEqual(self.dv.clip(2, 3), self.v.clip(2, 3))
self.assertVariableEqual(self.dv.argsort(), self.v.argsort())
self.assertVariableEqual(self.dv.clip(2, 3), self.v.clip(2, 3))
# test ufuncs
expected = deepcopy(self.ds)
expected['foo'][:] = np.sin(self.x)
self.assertDataArrayEquiv(expected['foo'], np.sin(self.dv))
self.assertDataArrayEquiv(self.dv, np.maximum(self.v, self.dv))
bar = XArray(['x', 'y'], np.zeros((10, 20)))
bar = Variable(['x', 'y'], np.zeros((10, 20)))
self.assertDataArrayEquiv(self.dv, np.maximum(self.dv, bar))

def test_math(self):
Expand All @@ -132,7 +132,7 @@ def test_math(self):
self.assertDataArrayEquiv(a, 0 * a + a)
# test different indices
ds2 = self.ds.update({'x': ('x', 3 + np.arange(10))}, inplace=False)
b = DataArray(ds2, 'foo')
b = ds2['foo']
with self.assertRaisesRegexp(ValueError, 'not aligned'):
a + b
with self.assertRaisesRegexp(ValueError, 'not aligned'):
Expand Down Expand Up @@ -181,13 +181,13 @@ def test_coord_math(self):

def test_item_math(self):
self.ds['x'] = ('x', np.array(list('abcdefghij')))
self.assertXArrayEqual(self.dv + self.dv[0, 0],
self.dv + self.dv[0, 0].data)
self.assertVariableEqual(self.dv + self.dv[0, 0],
self.dv + self.dv[0, 0].values)
new_data = self.x[0][None, :] + self.x[:, 0][:, None]
self.assertXArrayEqual(self.dv[:, 0] + self.dv[0],
XArray(['x', 'y'], new_data))
self.assertXArrayEqual(self.dv[0] + self.dv[:, 0],
XArray(['y', 'x'], new_data.T))
self.assertVariableEqual(self.dv[:, 0] + self.dv[0],
Variable(['x', 'y'], new_data))
self.assertVariableEqual(self.dv[0] + self.dv[:, 0],
Variable(['y', 'x'], new_data.T))

def test_inplace_math(self):
x = self.x
Expand All @@ -197,18 +197,18 @@ def test_inplace_math(self):
b += 1
self.assertIs(b, a)
self.assertIs(b.variable, v)
self.assertIs(b.data, x)
self.assertIs(b.values, x)
self.assertIs(b.dataset, self.ds)

def test_transpose(self):
self.assertXArrayEqual(self.dv.variable.transpose(),
self.assertVariableEqual(self.dv.variable.transpose(),
self.dv.transpose())

def test_squeeze(self):
self.assertXArrayEqual(self.dv.variable.squeeze(), self.dv.squeeze())
self.assertVariableEqual(self.dv.variable.squeeze(), self.dv.squeeze())

def test_reduce(self):
self.assertXArrayEqual(self.dv.reduce(np.mean, 'x'),
self.assertVariableEqual(self.dv.reduce(np.mean, 'x'),
self.v.reduce(np.mean, 'x'))
# needs more...
# should check which extra dimensions are dropped
Expand All @@ -217,12 +217,12 @@ def test_groupby_iter(self):
for ((act_x, act_dv), (exp_x, exp_ds)) in \
zip(self.dv.groupby('y'), self.ds.groupby('y')):
self.assertEqual(exp_x, act_x)
self.assertDataArrayEqual(DataArray(exp_ds, 'foo'), act_dv)
self.assertDataArrayEqual(exp_ds['foo'], act_dv)
for ((_, exp_dv), act_dv) in zip(self.dv.groupby('x'), self.dv):
self.assertDataArrayEqual(exp_dv, act_dv)

def test_groupby(self):
agg_var = XArray(['y'], np.array(['a'] * 9 + ['c'] + ['b'] * 10))
agg_var = Variable(['y'], np.array(['a'] * 9 + ['c'] + ['b'] * 10))
self.dv['abc'] = agg_var
self.dv['y'] = 20 + 100 * self.ds['y'].variable

Expand All @@ -236,45 +236,45 @@ def test_groupby(self):
self.assertDataArrayEqual(expected, actual)

grouped = self.dv.groupby('abc', squeeze=True)
expected_sum_all = DataArray(Dataset(
{'foo': XArray(['abc'], np.array([self.x[:, :9].sum(),
expected_sum_all = Dataset(
{'foo': Variable(['abc'], np.array([self.x[:, :9].sum(),
self.x[:, 10:].sum(),
self.x[:, 9:10].sum()]).T,
{'cell_methods': 'x: y: sum'}),
'abc': XArray(['abc'], np.array(['a', 'b', 'c']))}), 'foo')
'abc': Variable(['abc'], np.array(['a', 'b', 'c']))})['foo']
self.assertDataArrayAllClose(
expected_sum_all, grouped.reduce(np.sum, dimension=None))
self.assertDataArrayAllClose(
expected_sum_all, grouped.sum(dimension=None))
self.assertDataArrayAllClose(
expected_sum_all, grouped.sum(axis=None))
expected_unique = XArray('abc', ['a', 'b', 'c'])
self.assertXArrayEqual(expected_unique, grouped.unique_coord)
expected_unique = Variable('abc', ['a', 'b', 'c'])
self.assertVariableEqual(expected_unique, grouped.unique_coord)
self.assertEqual(3, len(grouped))

grouped = self.dv.groupby('abc', squeeze=False)
self.assertDataArrayAllClose(
expected_sum_all, grouped.sum(dimension=None))

expected_sum_axis1 = DataArray(Dataset(
{'foo': XArray(['x', 'abc'], np.array([self.x[:, :9].sum(1),
expected_sum_axis1 = Dataset(
{'foo': Variable(['x', 'abc'], np.array([self.x[:, :9].sum(1),
self.x[:, 10:].sum(1),
self.x[:, 9:10].sum(1)]).T,
{'cell_methods': 'y: sum'}),
'x': self.ds.variables['x'],
'abc': XArray(['abc'], np.array(['a', 'b', 'c']))}), 'foo')
'abc': Variable(['abc'], np.array(['a', 'b', 'c']))})['foo']
self.assertDataArrayAllClose(expected_sum_axis1, grouped.reduce(np.sum))
self.assertDataArrayAllClose(expected_sum_axis1, grouped.sum())
self.assertDataArrayAllClose(expected_sum_axis1, grouped.sum('y'))

def test_concat(self):
self.ds['bar'] = XArray(['x', 'y'], np.random.randn(10, 20))
self.ds['bar'] = Variable(['x', 'y'], np.random.randn(10, 20))
foo = self.ds['foo'].select()
bar = self.ds['bar'].rename('foo').select()
# from dataset array:
self.assertXArrayEqual(XArray(['w', 'x', 'y'],
np.array([foo.data, bar.data])),
DataArray.concat([foo, bar], 'w'))
self.assertVariableEqual(Variable(['w', 'x', 'y'],
np.array([foo.values, bar.values])),
DataArray.concat([foo, bar], 'w'))
# from iteration:
grouped = [g for _, g in foo.groupby('x')]
stacked = DataArray.concat(grouped, self.ds['x'])
Expand Down
Loading

0 comments on commit 3753832

Please sign in to comment.