Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytest assert functions #1147

Merged
merged 19 commits into from
Dec 22, 2016
Merged

Pytest assert functions #1147

merged 19 commits into from
Dec 22, 2016

Conversation

max-sixty
Copy link
Collaborator

Is this a reasonable function to use with py.test? In place of the inheritance .assertVariableEqual etc.

  • Do we want separate functions for each class?
  • There's a comment or just switch to py.test and add an appropriate hook. - is that different to what's here?
  • This still needs to be completed, with additional functions for assert _identical and assert_approx_equal.
  • Do we want more description for the failure, above just printing the arguments? That can also be added later

@shoyer
Copy link
Member

shoyer commented Dec 2, 2016

Do we want separate functions for each class?

That seems like overkill -- I think one is fine.

There's a comment or just switch to py.test and add an appropriate hook. - is that different to what's here?

I was thinking of something like how python lets you override the message for assert a == b. But it actually looks like there is no suitable hook for custom explanations when using methods. __traceback__ = False, like you do here, is about the best you can do.

Do we want more description for the failure, above just printing the arguments? That can also be added later

It's nice to at least print the objects on separate lines, e.g., using something like '%r\n%r' % (a, b) for the assert message. Otherwise multi-line reprs end up with the start of the second object on the same line as the first.

assert as_variable(a).equals(b), (a, b)
elif isinstance(a, xr.Dataset):
assert a.equals(b), (a, b)
elif isinstance(a, dict): # coords
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coords are actually not dict subclasses, so this won't work. They do subclass collections.Mapping, though.

import xarray as xr
___tracebackhide__ = True
assert type(a) == type(b)
if isinstance(a, xr.DataArray):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have this mixed up with the Dataset clause?

assert_equal(a.data_vars, b.data_vars)
assert_equal(a.coords, b.coords)
elif isinstance(a, xr.Variable):
assert as_variable(a).equals(b), (a, b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to convert a to a variable -- it already is one.

elif isinstance(a, xr.Variable):
assert as_variable(a).equals(b), (a, b)
elif isinstance(a, xr.Dataset):
assert a.equals(b), (a, b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this (assert a.equals(b), (a, b)) would actually probably suffice for all of Dataset, DataArray and Variable.

@max-sixty
Copy link
Collaborator Author

Updated.

Is it worth 'testing the tests'? I couldn't find tests for the existing functions

assert sorted(a, key=str) == sorted(a, key=str)
assert_xarray_equal(a.coords, b.coords)
[assert_xarray_close(
a.variables[k], b.variables[k], rtol=1e-05, atol=1e-08)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use rtol=rtol, atol=atol here

assert_xarray_equal(a.data_vars, b.data_vars)
assert_xarray_equal(a.coords, b.coords)
elif isinstance(a, (xr.Variable, xr.DataArray, xr.Coordinate)):
assert a.equals(b), '{}/n{}'.format(a, b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use \n not /n here and below

if isinstance(a, xr.Dataset):
assert_xarray_equal(a.data_vars, b.data_vars)
assert_xarray_equal(a.coords, b.coords)
elif isinstance(a, (xr.Variable, xr.DataArray, xr.Coordinate)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coordinate (actually IndexVariable, now) is a Variable subclass, so you don't need to call it out separately

import xarray as xr
___tracebackhide__ = True
assert type(a) == type(b)
if isinstance(a, xr.Dataset):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just use .equals for Dataset, as well -- no need to handle data_vars and coord separately

assert_xarray_equal(a.coords, b.coords)
elif isinstance(a, (xr.Variable, xr.DataArray, xr.Coordinate)):
assert a.equals(b), '{}/n{}'.format(a, b)
elif isinstance(a, xr.core.coordinates.AbstractCoordinates):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we really need need this case

@max-sixty
Copy link
Collaborator Author

max-sixty commented Dec 8, 2016

Thanks for the feedback! Updated.

@max-sixty max-sixty changed the title RFC for initial pytest assert function Pytest assert functions Dec 9, 2016
assert type(a) == type(b)
if isinstance(a, (xr.Variable, xr.DataArray, xr.Dataset)):
assert a.equals(b), '{}\n{}'.format(a, b)
elif isinstance(a, xr.core.coordinates.AbstractCoordinates):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is needed for coordinates - let me know if there's a better way

@max-sixty
Copy link
Collaborator Author

@shoyer green

@shoyer
Copy link
Member

shoyer commented Dec 15, 2016

This looks good to me. The last thing I would do is switch the existing test methods like assertDatasetEqual to use these functions, just to reduce the amount of redundant code (and also test your new functions more extensively).

@max-sixty
Copy link
Collaborator Author

The last thing I would do is switch the existing test methods like assertDatasetEqual to use these functions, just to reduce the amount of redundant code (and also test your new functions more extensively).

All of tests? I think that's a fairly heavy lift for this PR. It also happens to be really tedious and unfortunately I don't think Find / Replace-able... If you feel strongly I'll plug in and go through though!

Otherwise we can roll out over time as people write new tests

@shoyer
Copy link
Member

shoyer commented Dec 15, 2016 via email

raise TypeError('{} not supported by assertion comparison'
.format(type(a)))

def assert_xarray_close(a, b, rtol=1e-05, atol=1e-08):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be slightly more consistent to call this allclose rather than close.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why allclose but not allequal?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the name of the functions in numpy.testing

@shoyer
Copy link
Member

shoyer commented Dec 15, 2016

You might also add these to the API docs page and what's new. I know there has been interest in public test functions for quite some time.

@fmaussion
Copy link
Member

I know there has been interest in public test functions for quite some time.

Yes, see also #754

@max-sixty
Copy link
Collaborator Author

@shoyer do you have ideas for when the types are not the same? For example, many of the tests compare Variable to IndexVariable. Should we not check types? Add a check_type=True argument to the testing functions? Change the tests so they're compliant?

@shoyer
Copy link
Member

shoyer commented Dec 15, 2016

In most cases I would guess these are broken tests?

@max-sixty
Copy link
Collaborator Author

There are more than a dozen: https://travis-ci.org/pydata/xarray/jobs/184356287#L2611

Also some binary / unicode errors - let me know if you have context

assert_xarray_equal(a.coords, b.coords)
elif isinstance(a, xr.Dataset):
assert sorted(a, key=str) == sorted(a, key=str)
assert_xarray_equal(a.coords, b.coords)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is why many (not all) of the new tests are failing (at least the unicode/bytes ones, because we have a hack that says unicode/bytes can still be "close" if they encode the same string). Previously we used allclose for all variables, now coords are required to be equal.

I can see why this makes sense, but maybe better to keep with what we have now or add a keyword argument to switch.

@max-sixty
Copy link
Collaborator Author

max-sixty commented Dec 21, 2016

@shoyer I fixed most of the tests. Will come back and check the remaining few failures

I didn't quite realize how many .assertVariable... tests were relying on coercion, so I ended up changing the tests in a lot of places. The test is explicit at the test site, at least.

There were a couple that I think are probably breaks - things like .argsort changing the type of a Variable. I marked those with comments but haven't fixed them - or we'll never get this in! Let me know if you have thoughts on any notes

@max-sixty
Copy link
Collaborator Author

Rebased on #1175, so that should be merged first if people agree



def requires_netCDF4(test):
return test if has_netCDF4 else unittest.skip('requires netCDF4')(test)
return test if has_netCDF4 else pytest.mark.skip('requires dask')(test)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dask -> netCDF4

@fmaussion
Copy link
Member

This is great! Before merge it would be good to add the new standalone assert_* functions to the API doc?

@max-sixty
Copy link
Collaborator Author

@fmaussion had a go - is that what you were thinking?

@fmaussion
Copy link
Member

Thanks @MaximilianR , this will be useful for all libraries downstream of xarray

@spencerahill
Copy link
Contributor

This is great. At least in terms of the functionality I was looking for, I'd say this closes #754.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MaximilianR. I have a few more minor adjustments but this is very close.

[assert_xarray_allclose(a[k], b[k], rtol=rtol, atol=atol) for k in a.variables]
else:
# unsure if we should need this branch
# https://github.com/pydata/xarray/issues/1152
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my comment in #1152, DataArray does not have a .variables attribute. I think we should update this case to test both .variable and everything in .coords.variables.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️

elif isinstance(a, xr.Dataset):
assert sorted(a, key=str) == sorted(a, key=str)
[assert_xarray_allclose(a[k], b[k], rtol=rtol, atol=atol)
for k in list(a.variables) + list(a.coords)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor style point, but I find it a little unexpected to use a list comprehension when throwing away the result. I would write this with a normal for loop:

for k in list(a.variables) + list(a.coords):
    assert_xarray_allclose(a[k], b[k], rtol=rtol, atol=atol)

@@ -912,7 +913,12 @@ def test_cross_engine_read_write_netcdf3(self):
for read_engine in valid_engines:
with open_dataset(tmp_file,
engine=read_engine) as actual:
self.assertDatasetAllClose(data, actual)
# hack to allow test to work:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now, but we should look into this later because it looks like a legit bug to me.

It might be worth making a dedicated VariablesDict that is simply an OrderedDict with runtime type checking that verifies you can never put in anything other than a Variable. Or we should use pytype for static checks to ensure Dataset._variables and DataArray._coords are typed typing.OrderedDict[Any, xarray.Variable].

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it's a bug


# should we need `.to_base_variable()`?
# probably a break that `+v` changes type?
v = self.cls(['x'], x).to_base_variable()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this it's OK that any arithmetic converts IndexVariable to Variable. That seems pretty consistent to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of always converting v to a base variable, I would make another copy that is the base variable and using that for the expected variable. Thus these tests will still check that math with IndexVariable works even though it doesn't preserve the type.

for actual in [expected.T,
expected[...],
expected.squeeze(),
expected.isel(x=slice(None)),
expected.expand_dims({'x': 3}),
expected.copy(deep=True),
expected.copy(deep=False)]:

self.assertVariableIdentical(expected, actual)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use .to_base_variable() in the assert here (on both sides). Otherwise this test doesn't have any coverage for IndexVariable objects.

@@ -304,22 +309,23 @@ def test_equals_all_dtypes(self):
def test_eq_all_dtypes(self):
# ensure that we don't choke on comparisons for which numpy returns
# scalars
expected = self.cls('x', 3 * [False])
expected = self.cls('x', 3 * [False]).to_base_variable()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you can just use Variable('x', 3 * [False]) in this case and other ones like it, which is maybe a little clearer.

@max-sixty
Copy link
Collaborator Author

@shoyer green!

for v, _ in self.example_1d_objects():
actual = 'z' == v
self.assertVariableIdentical(expected, actual)
actual = ~('z' != v)
self.assertVariableIdentical(expected, actual)

def test_encoding_preserved(self):
expected = self.cls('x', range(3), {'foo': 1}, {'bar': 2}).to_base_variable()
expected = Variable('x', range(3), {'foo': 1}, {'bar': 2})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should still be self.cls, not Variable.

@max-sixty
Copy link
Collaborator Author

@shoyer green!

@shoyer shoyer merged commit 9dcfa73 into pydata:master Dec 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants