Skip to content
This repository has been archived by the owner on Mar 20, 2018. It is now read-only.

Re-raise unrecognized exceptions on retry. #168

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

lukesneeringer
Copy link
Contributor

Right now, if our retry logic encounters an exception it does not recognize, it raises RetryError which wraps the original exception.

This probably is not what we want. The end result is that the RetryError will end up masking the actual exception that needs to be fixed. The developer can still get at it (by calling .cause.exception()), but that is not clear or documented anywhere and there is no intuitive reason to try it.

I assert that it would be better to just raise the original exception, in the original context, and reserve RetryError for errors that the retry logic fundamentally understands the nature of (e.g. hitting the max retry limit).

If accepted, this PR:

Copy link
Contributor

@geigerj geigerj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am opposed to this change.

Back when Tim was Python TL, there were assertion thats

  • Simple users of the GAPIC should not need to know that the gRPC layer exists. Therefore, they should never encounter gRPC errors directly. They should be able to assert that if an RPC error occurs while calling a GAPIC method, it will be a GaxError. (Meanwhile, advanced users can still access the underlying gRPC error via the attributes of the GaxError.)

  • One should be able to configure any transport layer of GAX without changing its behavior. For example, if we implemented GAX support for HTTP/JSON, a RPC error that occurs during retrying should still be a GaxError.

  • The impact of both of these is that gRPC errors are never raised from GAX, since ApiCallables are always wrapped to ensure they always raise GaxErrors when an exception occurs. It is inconsistent only to return the underlying error in the retry case.

I'd like to be convinced that these no longer hold before we go ahead with this.

@@ -120,10 +120,11 @@ def inner(*args):
return to_call(*args)
except Exception as exception: # pylint: disable=broad-except
code = config.exc_to_code(exception)

# Do not be greedy; if this is an exception that we do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm not sure "do not be greedy" is going to make sense to someone reading this code without knowledge of its history.

@geigerj
Copy link
Contributor

geigerj commented Mar 2, 2017

FWIW, I think the "right" solution to this problem is to

  • make sure developers know of the existence of GaxError.cause
  • Intervene at the handwritten google-cloud-python level in cases where we don't want users to encounter a GaxError.

@lukesneeringer
Copy link
Contributor Author

lukesneeringer commented Mar 2, 2017

@geigerj Good points, all. :-) I am not sure I am convinced. I need to figure out the right thing to say to GoogleCloudPlatform/python-docs-samples#833, so let's figure out what that is. My best guess is that my change might not be where we end up, but the status quo is probably wrong also.

Simple users of the GAPIC should need to know that the gRPC layer exists. Therefore, they should never encounter gRPC errors directly. They should be able to assert that if an RPC error occurs while calling a GAPIC method, it will be a GaxError.

My gut tells me that this requirement is correct, at least as a design principle. This also has the advantage that if you want to, you can except GaxError and expect to get most/all things.

I am not as sure that it is correct in "unknown" cases (which is what this is). So, the fundamental exception here is something that our code was not ready for and literally caught with except Exception. So, the question is, which is better, to send up a "wrapped" version of the exception (with a not-obvious API), or send up the unrecognized exception to the caller?

I could see a pretty strong case that if we get GRPC errors that we should be able to handle and the code is unrecognized, we treat those as bugs that need to be wrapped in a GaxError and given an appropriate/helpful error.

Bottom line: I agree with you that we should avoid sending up transport-specific errors as a general practice. But, I am less sure that handling unknown cases with a sword this broad is wise.

I am intentionally inviting debate / disagreement here. Tell me I am wrong. :-)

(Meanwhile, advanced users can still access the underlying gRPC error via the attributes of the GaxError.)

Do we document how to do this?
(If not, where is the right place where it would be found? Our docs are kind of all over the place right now.)

One should be able to configure any transport layer of GAX without changing its behavior. For example, if we implemented GAX support for HTTP/JSON, a RPC error that occurs during retrying should still be a GaxError.

Again, this sounds right as a design principle; I am unsure it should apply to unknown cases.

The impact of both of these is that gRPC errors are never raised from GAX, since ApiCallables are always wrapped to ensure they always raise GaxErrors when an exception occurs. It is inconsistent only to return the underlying error in the retry case.

I think you are wrong here. As best as I can read this code, to_catch is a whitelist. It will not catch any old error; it will only catch the errors it knows how to deal with. The proposed change would make retry errors act more like this code, would it not?

@lukesneeringer
Copy link
Contributor Author

Update: If we go with something like my strategy here, I am fairly certain that we would still consider GoogleCloudPlatform/python-docs-samples#833 to be a bug; the bug would be that we are not appropriately catching the code being sent back by the Vision API.

@lukesneeringer
Copy link
Contributor Author

@jonparrott Could you add thoughts on this question? (Social note: I explicitly confirmed with Jacob that adding another voice here would be instructive.)

@geigerj
Copy link
Contributor

geigerj commented Mar 2, 2017

Summary of offline discussion with @lukesneeringer:

  • We should not wrap all exceptions without any knowledge of what we're wrapping here.
  • We should aim to preserve the original design intention of raising GaxError in general to indicate API errors, regardless of the transport layer
  • There is already a mechanism for doing this used by api_callable and configured by config.API_ERRORS
  • We can reuse this mechanism here to ensure that we are wrapping API errors so that users don't have to worry about transport when writing exception handling code, without wrapping all exceptions
  • We should include the string representation of the wrapped exception in the GaxError message so that it is recorded when users log their exceptions
  • If the documentation of GaxError.cause isn't sufficient or visible enough, improve it

@theacodes
Copy link

I agree with where you guys ended up, let me re-iterate:

  1. Known (non-exceptional) transport exceptions should be wrapped.
  2. Unknown (exceptional) exceptions, such as KeyboardInterrupt, OSError, etc., should bubble up as-is.
  3. GaxError.cause should be better documented.
  4. We should be aware that Exception Chaining in Python 3 will help things, and make sure our code doesn't obscure that.

As an aside, I super hate GaxError just on name alone. I feel like if I'm using google.cloud.something, I expect custom exceptions to be derived from something in google.cloud, maybe google.cloud.exceptions.GoogleCloudError. This is one of the many reasons I'd like to see gax and google-cloud-core converge, gax is a really unfortunate name for usability.

@lukesneeringer
Copy link
Contributor Author

@geigerj @jonparrott Okay, this change has gotten more complex, but I think it is for the better for your input. That said, I want to run this through a little more real world testing before merging, since I had to modify the unit tests pretty drastically.

Assigned to @geigerj for review, but I want to justify a couple things in advance (and, of course, pushback is welcome if you think I am wrong):

(1) I went from mocking exc_to_code to creating a RpcError subclass which accepts a code and handles exc_to_code correctly. The reason for this was to make it a little easier to reason about how our cases have changed: we know recognize that we might get RpcError instances (which have a .code() method and unrelated exceptions (which do not). I am up for considering alternative implementations, but this one seems easiest to grok.

(2) I changed every mock_call variable in test_retry.py to mock_func. This was to get past something that I found personally confusing; MagicMock has a mock_calls attribute that I use often, and it was confusing to me that mock_call was not actually a mock.Call.

(3) CustomException is now used for non-RPC exceptions; e.g. ones that should bubble up.

__all__ = ['CustomException', 'GaxError', 'MockGrpcException', 'RetryError']


class MockGrpcException(config.API_ERRORS[0]):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use mock.Mock(spec=existing_exception_class) if you wanted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lukesneeringer@lukesneeringer0:~/Code/Google/gax-python:reraise-unknown-retry-exc$ python3
Python 3.5.2 (default, Jan  5 2017, 10:47:09) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from unittest import mock
>>> MockValueError = mock.Mock(spec=ValueError)
>>> 
>>> try:
...     raise MockValueError('Oops!')
... except ValueError:
...     print('caught correctly')
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
TypeError: exceptions must derive from BaseException
>>> 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, according to the mock documentation, trying to play with __subclasscheck__ is not supported.

return self._code


class CustomException(Exception):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use a built-in exception class instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inertia. CustomException already existed (I just moved it). I think that would be an improvement, though, and it is easy to change. :-)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I think that makes the intent of the test clearer. Thank you.

'Retry timeout exceeded; exception: %s' % exception,
cause=exception,
)
six.reraise(errors.RetryError, exc, sys.exc_info()[2])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geigerj This is something I would like a second opinion on -- what I am doing here is raising the RetryError with the given message (which now uses the exception's message), but maintaining the original traceback. I assert this is more useful for developers, and reduces the need to make people care about .cause.exception.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukesneeringer can I see an example traceback with and without this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I will need a few minutes to piece together getting one from a live API, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonparrott Heading to a meeting, so this will be delayed. I have not forgotten it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Ideally it would also be possible to determine that the wrapped exception originated at this line of code.

'Retry timeout exceeded; exception: %s' % exception,
cause=exception,
)
six.reraise(errors.RetryError, exc, sys.exc_info()[2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Ideally it would also be possible to determine that the wrapped exception originated at this line of code.

__all__ = ['GaxError', 'MockGrpcException', 'RetryError']


class MockGrpcException(config.API_ERRORS[0]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inheriting from config.API_ERRORS[0] feels a little awkward to me, but I guess it avoids mocking google.gax.config, which is what we were doing before?

try:
# Create the callable and establish that we get a GaxError.
my_callable = retry.retryable(mock_func, retry_options)
with self.assertRaises(errors.GaxError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing weaker condition than before, since we're no longer checking the cause of the GaxError.

try:
# Establish that the custom exception is bubbled up (not wrapped), and
# that the retryable function was called only once, not twice.
with self.assertRaises(ValueError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is testing a weaker condition, since it ignores the cause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, there is no cause. The original exception is being bubbled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Ignore that comment.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RetryErrors obscure actual errors.
3 participants