Re-raise unrecognized exceptions on retry. #168

lukesneeringer · 2017-03-02T17:57:07Z

Right now, if our retry logic encounters an exception it does not recognize, it raises RetryError which wraps the original exception.

This probably is not what we want. The end result is that the RetryError will end up masking the actual exception that needs to be fixed. The developer can still get at it (by calling .cause.exception()), but that is not clear or documented anywhere and there is no intuitive reason to try it.

I assert that it would be better to just raise the original exception, in the original context, and reserve RetryError for errors that the retry logic fundamentally understands the nature of (e.g. hitting the max retry limit).

If accepted, this PR:

Fixes RetryErrors obscure actual errors. google-cloud-python#3083
Closes Expose RetryError exceptions. google-cloud-python#3087 (as no longer necessary)

geigerj

I am opposed to this change.

Back when Tim was Python TL, there were assertion thats

Simple users of the GAPIC should not need to know that the gRPC layer exists. Therefore, they should never encounter gRPC errors directly. They should be able to assert that if an RPC error occurs while calling a GAPIC method, it will be a GaxError. (Meanwhile, advanced users can still access the underlying gRPC error via the attributes of the GaxError.)
One should be able to configure any transport layer of GAX without changing its behavior. For example, if we implemented GAX support for HTTP/JSON, a RPC error that occurs during retrying should still be a GaxError.
The impact of both of these is that gRPC errors are never raised from GAX, since ApiCallables are always wrapped to ensure they always raise GaxErrors when an exception occurs. It is inconsistent only to return the underlying error in the retry case.

I'd like to be convinced that these no longer hold before we go ahead with this.

geigerj · 2017-03-02T18:12:06Z

google/gax/retry.py

@@ -120,10 +120,11 @@ def inner(*args):
                return to_call(*args)
            except Exception as exception:  # pylint: disable=broad-except
                code = config.exc_to_code(exception)
+
+                # Do not be greedy; if this is an exception that we do


nit: I'm not sure "do not be greedy" is going to make sense to someone reading this code without knowledge of its history.

geigerj · 2017-03-02T18:23:56Z

FWIW, I think the "right" solution to this problem is to

make sure developers know of the existence of GaxError.cause
Intervene at the handwritten google-cloud-python level in cases where we don't want users to encounter a GaxError.

lukesneeringer · 2017-03-02T18:35:30Z

@geigerj Good points, all. :-) I am not sure I am convinced. I need to figure out the right thing to say to GoogleCloudPlatform/python-docs-samples#833, so let's figure out what that is. My best guess is that my change might not be where we end up, but the status quo is probably wrong also.

Simple users of the GAPIC should need to know that the gRPC layer exists. Therefore, they should never encounter gRPC errors directly. They should be able to assert that if an RPC error occurs while calling a GAPIC method, it will be a GaxError.

My gut tells me that this requirement is correct, at least as a design principle. This also has the advantage that if you want to, you can except GaxError and expect to get most/all things.

I am not as sure that it is correct in "unknown" cases (which is what this is). So, the fundamental exception here is something that our code was not ready for and literally caught with except Exception. So, the question is, which is better, to send up a "wrapped" version of the exception (with a not-obvious API), or send up the unrecognized exception to the caller?

I could see a pretty strong case that if we get GRPC errors that we should be able to handle and the code is unrecognized, we treat those as bugs that need to be wrapped in a GaxError and given an appropriate/helpful error.

Bottom line: I agree with you that we should avoid sending up transport-specific errors as a general practice. But, I am less sure that handling unknown cases with a sword this broad is wise.

I am intentionally inviting debate / disagreement here. Tell me I am wrong. :-)

(Meanwhile, advanced users can still access the underlying gRPC error via the attributes of the GaxError.)

Do we document how to do this?
(If not, where is the right place where it would be found? Our docs are kind of all over the place right now.)

One should be able to configure any transport layer of GAX without changing its behavior. For example, if we implemented GAX support for HTTP/JSON, a RPC error that occurs during retrying should still be a GaxError.

Again, this sounds right as a design principle; I am unsure it should apply to unknown cases.

The impact of both of these is that gRPC errors are never raised from GAX, since ApiCallables are always wrapped to ensure they always raise GaxErrors when an exception occurs. It is inconsistent only to return the underlying error in the retry case.

I think you are wrong here. As best as I can read this code, to_catch is a whitelist. It will not catch any old error; it will only catch the errors it knows how to deal with. The proposed change would make retry errors act more like this code, would it not?

lukesneeringer · 2017-03-02T18:39:43Z

Update: If we go with something like my strategy here, I am fairly certain that we would still consider GoogleCloudPlatform/python-docs-samples#833 to be a bug; the bug would be that we are not appropriately catching the code being sent back by the Vision API.

lukesneeringer · 2017-03-02T18:45:07Z

@jonparrott Could you add thoughts on this question? (Social note: I explicitly confirmed with Jacob that adding another voice here would be instructive.)

geigerj · 2017-03-02T18:57:35Z

Summary of offline discussion with @lukesneeringer:

We should not wrap all exceptions without any knowledge of what we're wrapping here.
We should aim to preserve the original design intention of raising GaxError in general to indicate API errors, regardless of the transport layer
There is already a mechanism for doing this used by api_callable and configured by config.API_ERRORS
We can reuse this mechanism here to ensure that we are wrapping API errors so that users don't have to worry about transport when writing exception handling code, without wrapping all exceptions
We should include the string representation of the wrapped exception in the GaxError message so that it is recorded when users log their exceptions
If the documentation of GaxError.cause isn't sufficient or visible enough, improve it

theacodes · 2017-03-02T19:11:29Z

I agree with where you guys ended up, let me re-iterate:

Known (non-exceptional) transport exceptions should be wrapped.
Unknown (exceptional) exceptions, such as KeyboardInterrupt, OSError, etc., should bubble up as-is.
GaxError.cause should be better documented.
We should be aware that Exception Chaining in Python 3 will help things, and make sure our code doesn't obscure that.

As an aside, I super hate GaxError just on name alone. I feel like if I'm using google.cloud.something, I expect custom exceptions to be derived from something in google.cloud, maybe google.cloud.exceptions.GoogleCloudError. This is one of the many reasons I'd like to see gax and google-cloud-core converge, gax is a really unfortunate name for usability.

lukesneeringer · 2017-03-02T21:44:33Z

@geigerj @jonparrott Okay, this change has gotten more complex, but I think it is for the better for your input. That said, I want to run this through a little more real world testing before merging, since I had to modify the unit tests pretty drastically.

Assigned to @geigerj for review, but I want to justify a couple things in advance (and, of course, pushback is welcome if you think I am wrong):

(1) I went from mocking exc_to_code to creating a RpcError subclass which accepts a code and handles exc_to_code correctly. The reason for this was to make it a little easier to reason about how our cases have changed: we know recognize that we might get RpcError instances (which have a .code() method and unrelated exceptions (which do not). I am up for considering alternative implementations, but this one seems easiest to grok.

(2) I changed every mock_call variable in test_retry.py to mock_func. This was to get past something that I found personally confusing; MagicMock has a mock_calls attribute that I use often, and it was confusing to me that mock_call was not actually a mock.Call.

(3) CustomException is now used for non-RPC exceptions; e.g. ones that should bubble up.

theacodes · 2017-03-02T21:47:46Z

tests/utils/errors.py

+__all__ = ['CustomException', 'GaxError', 'MockGrpcException', 'RetryError']
+
+
+class MockGrpcException(config.API_ERRORS[0]):


you could use mock.Mock(spec=existing_exception_class) if you wanted.

lukesneeringer@lukesneeringer0:~/Code/Google/gax-python:reraise-unknown-retry-exc$ python3 Python 3.5.2 (default, Jan 5 2017, 10:47:09) [GCC 4.8.4] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from unittest import mock >>> MockValueError = mock.Mock(spec=ValueError) >>> >>> try: ... raise MockValueError('Oops!') ... except ValueError: ... print('caught correctly') ... Traceback (most recent call last): File "<stdin>", line 2, in <module> TypeError: exceptions must derive from BaseException >>>

Additionally, according to the mock documentation, trying to play with __subclasscheck__ is not supported.

theacodes · 2017-03-02T21:48:03Z

tests/utils/errors.py

+        return self._code
+
+
+class CustomException(Exception):


Why not just use a built-in exception class instead?

Inertia. CustomException already existed (I just moved it). I think that would be an improvement, though, and it is easy to change. :-)

Cool, I think that makes the intent of the test clearer. Thank you.

lukesneeringer · 2017-03-02T21:49:11Z

google/gax/retry.py

+                        'Retry timeout exceeded; exception: %s' % exception,
+                        cause=exception,
+                    )
+                    six.reraise(errors.RetryError, exc, sys.exc_info()[2])


@geigerj This is something I would like a second opinion on -- what I am doing here is raising the RetryError with the given message (which now uses the exception's message), but maintaining the original traceback. I assert this is more useful for developers, and reduces the need to make people care about .cause.exception.

@lukesneeringer can I see an example traceback with and without this?

Yeah. I will need a few minutes to piece together getting one from a live API, though.

@jonparrott Heading to a meeting, so this will be delayed. I have not forgotten it.

+1. Ideally it would also be possible to determine that the wrapped exception originated at this line of code.

geigerj · 2017-03-06T21:20:10Z

google/gax/retry.py

+                        'Retry timeout exceeded; exception: %s' % exception,
+                        cause=exception,
+                    )
+                    six.reraise(errors.RetryError, exc, sys.exc_info()[2])


+1. Ideally it would also be possible to determine that the wrapped exception originated at this line of code.

geigerj · 2017-03-06T21:26:46Z

tests/utils/errors.py

+__all__ = ['GaxError', 'MockGrpcException', 'RetryError']
+
+
+class MockGrpcException(config.API_ERRORS[0]):


Inheriting from config.API_ERRORS[0] feels a little awkward to me, but I guess it avoids mocking google.gax.config, which is what we were doing before?

geigerj · 2017-03-06T21:30:16Z

tests/test_retry.py

-        try:
+        # Create the callable and establish that we get a GaxError.
+        my_callable = retry.retryable(mock_func, retry_options)
+        with self.assertRaises(errors.GaxError):


This is testing weaker condition than before, since we're no longer checking the cause of the GaxError.

geigerj · 2017-03-06T21:31:18Z

tests/test_retry.py

-        try:
+        # Establish that the custom exception is bubbled up (not wrapped), and
+        # that the retryable function was called only once, not twice.
+        with self.assertRaises(ValueError):


Again, this is testing a weaker condition, since it ignores the cause.

In this case, there is no cause. The original exception is being bubbled.

Ah, right. Ignore that comment.

Re-raise unrecognized exceptions on retry.

1e78db7

lukesneeringer self-assigned this Mar 2, 2017

lukesneeringer requested a review from geigerj March 2, 2017 17:57

lukesneeringer mentioned this pull request Mar 2, 2017

Expose RetryError exceptions. googleapis/google-cloud-python#3087

Closed

Make tests catch appropriate custom errors.

547f9d9

geigerj suggested changes Mar 2, 2017

View reviewed changes

Update error handling to catch and wrap RpcError.

145a00d

Fix Travis.

09b3342

theacodes reviewed Mar 2, 2017

View reviewed changes

lukesneeringer commented Mar 2, 2017

View reviewed changes

Luke Sneeringer added 3 commits March 2, 2017 13:50

Merge branch 'master' into reraise-unknown-retry-exc

767a960

Remove errors.CustomException, whose meaning has changed.

1251b57

Fix __all__.

c725aa8

geigerj reviewed Mar 6, 2017

View reviewed changes

lukesneeringer mentioned this pull request Mar 20, 2017

Removing unused code-path for remapping raw gRPC exceptions in datastore. googleapis/google-cloud-python#3164

Merged

pdknsk mentioned this pull request May 9, 2017

Exception occurred in retry method that was not classified as transient #186

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-raise unrecognized exceptions on retry. #168

Re-raise unrecognized exceptions on retry. #168

lukesneeringer commented Mar 2, 2017

geigerj left a comment •

edited

Loading

geigerj Mar 2, 2017

geigerj commented Mar 2, 2017

lukesneeringer commented Mar 2, 2017 •

edited

Loading

lukesneeringer commented Mar 2, 2017

lukesneeringer commented Mar 2, 2017

geigerj commented Mar 2, 2017

theacodes commented Mar 2, 2017

lukesneeringer commented Mar 2, 2017

theacodes Mar 2, 2017

lukesneeringer Mar 2, 2017

lukesneeringer Mar 2, 2017

theacodes Mar 2, 2017

lukesneeringer Mar 2, 2017

theacodes Mar 2, 2017

lukesneeringer Mar 2, 2017

theacodes Mar 2, 2017

lukesneeringer Mar 2, 2017

lukesneeringer Mar 2, 2017

geigerj Mar 6, 2017

geigerj Mar 6, 2017

geigerj Mar 6, 2017

geigerj Mar 6, 2017

geigerj Mar 6, 2017

lukesneeringer Mar 6, 2017

geigerj Mar 6, 2017

		__all__ = ['CustomException', 'GaxError', 'MockGrpcException', 'RetryError']


		class MockGrpcException(config.API_ERRORS[0]):

		__all__ = ['GaxError', 'MockGrpcException', 'RetryError']


		class MockGrpcException(config.API_ERRORS[0]):

Re-raise unrecognized exceptions on retry. #168

Are you sure you want to change the base?

Re-raise unrecognized exceptions on retry. #168

Conversation

lukesneeringer commented Mar 2, 2017

geigerj left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geigerj commented Mar 2, 2017

lukesneeringer commented Mar 2, 2017 • edited Loading

lukesneeringer commented Mar 2, 2017

lukesneeringer commented Mar 2, 2017

geigerj commented Mar 2, 2017

theacodes commented Mar 2, 2017

lukesneeringer commented Mar 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geigerj left a comment •

edited

Loading

lukesneeringer commented Mar 2, 2017 •

edited

Loading