-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support aborted transactions internal retry #544
Conversation
@c24t, @olavloite, I've pushed a part of transaction retry mechanism implementation, PTAL. |
Pushed the second step of the implementation. Added actual retrying of aborted transaction. Requires to add more unit tests yet, for |
|
||
client_mock.assert_called_once_with( | ||
project=PROJECT, | ||
credentials=CREDENTIALS, | ||
client_info=CLIENT_INFO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is actually broken by another PR, but nox doesn't see it and doesn't run it. These tests were not moved to unit
directory.
from google.cloud.spanner_dbapi import Connection, InterfaceError | ||
from google.cloud.spanner_dbapi.checksum import ResultsChecksum | ||
from google.cloud.spanner_dbapi.connection import AUTOCOMMIT_MODE_WARNING | ||
from google.cloud.spanner_v1.database import Database | ||
from google.cloud.spanner_v1.instance import Instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why these tests are still here? They were copied into unit
directory, so I suppose they should be erased from this directory?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Looks like this (and test_connect
) weren't moved in #532?
The test files weren't exactly copied, #532 changed them and added some new tests. E.g. the version on master now doesn't include test_transaction_autocommit_warnings
.
@mf2199 can you confirm that you meant to change/remove these tests before removing tests/spanner_dbapi
in this PR?
@olavloite, am I understood correctly, that |
@IlyaFaer Yes, that is correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides cleaning up the tests, I wonder if this PR is catching Aborted
errors in the right place. Since we're only calling execute_sql
and not the streaming variant, I'd expect to see errors immediately instead of when we iterate over the results in the cursor. Am I missing something here?
return | ||
except Aborted: | ||
self.connection.retry_transaction() | ||
return self.fetchone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will be a problem. Assuming that this is using the ExecuteStreamingSql
RPC, then each next()
call could potentially mean that a new RPC is executed. So for the sake of simplicity, assume in the example below that each call to next()
executes the ExecuteStreamingSql
RPC.
Assume the following situation:
- The table Singers contains the following singers (last names): Allison, Morrison, Pieterson
- The application executes the query
SELECT LastName FROM Singers ORDER BY LastName
in transaction 1. - The client application calls
fetchone()
which returns 'Allison'. - Some other transaction executes `DELETE FROM Singers WHERE LastName='Pieterson'.
- The first transaction is aborted by the backend. A retry is executed and the retry logic checks that the checksum of the retried result set is equal to the original attempt, which it is as the first record is still 'Allison'.
- The client application calls
fetchone()
again. This should return 'Morrison', but as it needs to callExecuteStreamingSql
it will (probably) use the transaction id of the original transaction (unless that transaction id has somehow been replaced in the underlying iterator). If it does use the old transaction id, the RPC will fail with yet anotherAborted
error, and that will repeat itself until the transaction retry limit has been reached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me we can just drop the _transaction
property, so that Connection
will initiate a new one on the next execute()
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for reopening this, and this comment should not be considered blocking for merging this PR, but I think we need to look into this once more. Only dropping the _transaction
property will in this case not be enough for the following reason:
- When
executeSql
is called, a streaming iterator is returned to the application. - That streaming iterator is linked with the transaction that was active at that moment, and a reference to that transaction is also held in the iterator.
- If a transaction is aborted and the client application has consumed only parts of a streaming iterator, that iterator is no longer valid (at least: it will also throw an exception if it needs to receive more data from the server).
The JDBC driver client solves the above problem by wrapping all streaming iterators before returning these to the client application. That makes it possible for the JDBC driver to replace the underlying streaming iterator with a new one when a transaction has been aborted and successfully retried.
We should add that to the Python DBApi as well, but we could do that in a separate PR to prevent this PR from becoming even bigger than it already is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@c24t, @olavloite, hm-m. I think we're protected from errors here, because our connection API doesn't actually give streaming result objects to a user.
Here is where we're getting a streaming iterator:
python-spanner-django/google/cloud/spanner_dbapi/cursor.py
Lines 167 to 170 in 196c449
self._result_set = transaction.execute_sql( | |
sql, params, param_types=get_param_types(params) | |
) | |
self._itr = PeekIterator(self._result_set) |
So, iterator is held in the protected property _itr
, and users will be streaming it with Cursor.fetch*()
methods, without actual access to the iterator itself:
python-spanner-django/google/cloud/spanner_dbapi/cursor.py
Lines 204 to 212 in 196c449
def fetchone(self): | |
"""Fetch the next row of a query result set, returning a single | |
sequence, or None when no more data is available.""" | |
self._raise_if_closed() | |
try: | |
return next(self) | |
except StopIteration: | |
return None |
Where next(self)
is calling next(self._itr)
here:
python-spanner-django/google/cloud/spanner_dbapi/cursor.py
Lines 293 to 296 in 196c449
def __next__(self): | |
if self._itr is None: | |
raise ProgrammingError("no results to return") | |
return next(self._itr) |
Thus, if a transaction failed, the connection will drop the transaction, checkout a new one, re-run all the statements, each of which will replace _itr
with a new streamed iterator. So, all the iterators are processed internally, and will be replaced on a retry, as I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. I think this is coming very to close to what we need, but I have a couple of small questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more substantial comments from me, but note that this whole PR will have to move to python-spanner now that googleapis/python-spanner#160 is in.
@olavloite, does this address all your comments?
Yes, my comments have been addressed. I do think that it's important that we get the (integration) tests running soon, as there could still be corner cases that we haven't thought of yet that only occur incidentally. I also have a separate concern regarding streaming iterators, but that should not block the merging of this PR. |
This PR won't be merged here, see googleapis/python-spanner#168. |
Merged into the original Spanner client repo |
🤖 I have created a release \*beep\* \*boop\* --- ## [2.1.0](https://www.github.com/googleapis/python-spanner/compare/v2.0.0...v2.1.0) (2020-11-24) ### Features * **dbapi:** add aborted transactions retry support ([#168](https://www.github.com/googleapis/python-spanner/issues/168)) ([d59d502](https://www.github.com/googleapis/python-spanner/commit/d59d502590f618c8b13920ae05ab11add78315b5)), closes [#34](https://www.github.com/googleapis/python-spanner/issues/34) [googleapis/python-spanner-django#544](https://www.github.com/googleapis/python-spanner-django/issues/544) * remove adding a dummy WHERE clause into UPDATE and DELETE statements ([#169](https://www.github.com/googleapis/python-spanner/issues/169)) ([7f4d478](https://www.github.com/googleapis/python-spanner/commit/7f4d478fd9812c965cdb185c52aa9a8c9e599bed)) ### Bug Fixes * Add sqlparse dependency ([#171](https://www.github.com/googleapis/python-spanner/issues/171)) ([e801a2e](https://www.github.com/googleapis/python-spanner/commit/e801a2e014fcff66a69cb9da83abedb218cda2ab)) ### Reverts * Revert "test: unskip list_backup_operations sample test (#170)" (#174) ([6053f4a](https://www.github.com/googleapis/python-spanner/commit/6053f4ab0fc647a9cfc181e16c246141483c2397)), closes [#170](https://www.github.com/googleapis/python-spanner/issues/170) [#174](https://www.github.com/googleapis/python-spanner/issues/174) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please).
Implement aborted transactions retry mechanism.
While executing SQL statements in
!autocommit
mode, connection must remember every executed statement. In case the transaction aborted, all of these statements should be re-executed. Doing this, connection also must calculate checksum of every statement results, so that we could check if the retried transaction got the same results that the original one got. In case the checksums are not equal there is no way to continue transaction due to underlying data being changed during retry.Closes #539