-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implementation for batch dml in dbapi #1055
Conversation
and parsed_statement.statement_type != StatementType.INSERT | ||
): | ||
raise ProgrammingError( | ||
"Only DML statements are allowed in batch " "DML mode." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can this be just one string instead of two concatenated strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
connection._transaction = None | ||
raise Aborted(status.message) | ||
elif status.code != OK: | ||
raise OperationalError(status.message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should (could) this also include the status code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will take it in a follow up PR
@@ -196,6 +203,24 @@ def read_only(self, value): | |||
) | |||
self._read_only = value | |||
|
|||
@property | |||
def batch_mode(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly, giving these names that do not start with an underscore will make them part of the public API. In that case, we should document them and also add validations to verify that they are only called with valid arguments. (But probably we should make them internal instead)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the property
Args: | ||
value (BatchMode) | ||
""" | ||
self._batch_mode = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if an external uses calls this function when the connection is already in the middle of a different type of batch (e.g. it is now in a DML batch, and it is called to set it to DDL batch)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this property
self._cursor.execute("start batch dml") | ||
self._insert_row(7) | ||
self._insert_row(8) | ||
self._cursor.execute("run batch") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably difficult, but: Do we have any way that we could verify that the run batch
call really sends an ExecuteBatchDml
request, and does not execute multiple ExecuteSql
requests sequentially?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean if we could assert something in the test for that?
Manually I have verified it by debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant automated. That would then also guard us against any regressions, for example if something changes in the underlying client library, which cause this to use multiple ExecuteSql
RPCs instead of one ExecuteBatchDml
RPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs some investigation, so will take it in a follow up PR
VALUES ({i}, 'first-name-{i}', 'last-name-{i}', 'test.email@domen.ru') | ||
""" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also have tests for the unhappy path of using batch DML:
- What happens if the batch contains an invalid statement as the first statement?
- What happens if the batch contains an invalid statement in the middle?
- What happens if the batch contains an invalid statement at the end?
- Can we test and verify that the retry logic for Batch DML works as expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test for invalid statements.
There are issues with the retry logic which I am fixing in the next PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally good to me, but with one remaining question/nit on the exception that is raised when a batch fails. The exception should also contain the update counts of the statements that did succeed. Are we sure that it does?
self._cursor.execute("start batch dml") | ||
self._insert_row(7) | ||
self._insert_row(8) | ||
self._cursor.execute("run batch") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant automated. That would then also guard us against any regressions, for example if something changes in the underlying client library, which cause this to use multiple ExecuteSql
RPCs instead of one ExecuteBatchDml
RPC.
""" | ||
) | ||
with pytest.raises(OperationalError): | ||
self._cursor.execute("run batch") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected this to return a specific batch error that:
- Contains the error code, message etc.
- And contains the update counts of the statements that did succeed.
See for example this for the JDBC driver: https://github.com/googleapis/java-spanner-jdbc/blob/1f89f78c37b9e118e2c0cbc7f56d3eb1d5745863/src/test/java/com/google/cloud/spanner/jdbc/it/ITJdbcPreparedStatementTest.java#L797
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed we need to create a custom exception class, so will take it in follow up PR
): | ||
cursor.connection._database = mock_db = mock.MagicMock() | ||
mock_db.run_in_transaction = mock_run_in = mock.MagicMock() | ||
cursor.execute(sql="sql") | ||
mock_run_in.assert_called_once_with( | ||
cursor._do_execute_update_in_autocommit, "sql WHERE 1=1", None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this change in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to add WHERE clause has moved to parse_utils.classify_statement
now which we are mocking here so its same as what is returned as per line 255
VALUES ({i}, 'first-name-{i}', 'last-name-{i}', 'test.email@domen.ru') | ||
""" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
No description provided.