Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documents for cross-partition scan and import feature #1301

Merged
merged 8 commits into from
Nov 30, 2023

Conversation

jnmt
Copy link
Contributor

@jnmt jnmt commented Nov 21, 2023

Description

This PR adds documents related to cross-partition scan (so-called the relational scan before) and the table import feature. It depends on #1294, which is still under review, but PTAL in parallel since reviewing and revising the docs might take a long time.

Related issues and/or PRs

Changes made

  • Add cross-partition scan documents
  • Add table import documents

Checklist

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes.
  • Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
  • Tests (unit, integration, etc.) have been added for the changes.
  • My changes generate no new warnings.
  • Any dependent changes in other PRs have been merged and published. (Add cross-partition scan options #1294 should be merged together)

Additional notes (optional)

  • This PR basically focuses on the specification of the functions and how to use each function. We will provide a holistic guide for the migration to ScalarDB and its sample in other PRs.

Release notes

Added documents for cross-partition scan and table import.

Comment on lines 10 to 11
You should carefully plan to import a table to ScalarDB in production because it will add transaction metadata columns to your database tables and the ScalarDB metadata tables. There would also be several differences between your database and ScalarDB and limitations.
{% endcapture %}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After adding the holistic migration guide, I will add a reference for it around here.

Copy link
Contributor

@kota2and3kan kota2and3kan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates!
I left some suggestions and questions.
Please take a look when you have time!

{% capture notice--info %}
**Note**

In the `where()` condition method chain, the conditions must be an and-wise junction of `ConditionalExpression` or `OrConditionSet` (so-called conjunctive normal form) like the above example or an or-wise junction of `ConditionalExpression` or `AndConditionSet` (so-called disjunctive normal form).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the `where()` condition method chain, the conditions must be an and-wise junction of `ConditionalExpression` or `OrConditionSet` (so-called conjunctive normal form) like the above example or an or-wise junction of `ConditionalExpression` or `AndConditionSet` (so-called disjunctive normal form).
In the `where()` condition method chain, the conditions must be an and-wise junction of `ConditionalExpression` or `OrConditionSet` (so-called conjunctive normal form) like the above example or an or-wise junction of `ConditionalExpression` or `AndConditionSet` (so-called disjunctive normal form).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6726530.

docs/schema-loader-import.md Show resolved Hide resolved

1. The value range of `BIGINT` in ScalarDB is from -2^53 to 2^53, regardless of the size of `bigint` in the underlying database. Thus, if the data out of this range exists in the imported table, ScalarDB cannot read it.
2. For certain data types noted above, ScalarDB may map a data type larger than that of the underlying database. In that case, You will see errors when putting a value with a size larger than the size specified in the underlying database.
3. The maximum size of `BLOB` in ScalarDB is about 2GB (precisely 2^31-1 bytes). In contrast, Oracle `blob` can have (4GB-1)*(number of blocks). Thus, if the data larger than 2GB exists in the imported table, ScalarDB cannot read it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Do we observe null, some value, or an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question again. The 2GB limit is due to Java's byte array limits. I don't test it, but I guess the Oracle JDBC driver throws an SQLException. Or, we might see an OOM error if the heap size is not correctly configured. Apart from that, we might be able to handle such large objects better by using JDBC Blob getBlob(...) and offset-based access instead of byte[] getBytes(...).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I understood that it depends on Java side.
Thank you!

docs/api-guide.md Show resolved Hide resolved
@jnmt
Copy link
Contributor Author

jnmt commented Nov 24, 2023

@kota2and3kan Thanks for the feedback! Fixed based on the feedback (though the exception-related part is left as is for now), so PTAL when you get a chance.

@jnmt jnmt requested a review from kota2and3kan November 24, 2023 05:56
Copy link
Contributor

@kota2and3kan kota2and3kan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looking good! Thank you!
Left some comments and suggestions. PTAL!

{% capture notice--warning %}
**Attention**

Except for JDBC databases, we do not recommend enabling cross-partition scan with serializable isolation because it could make the isolation level lower (i.e., snapshot). Use it at your own risk only if the consistency does not matter for your transactions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Except for JDBC databases, we do not recommend enabling cross-partition scan with serializable isolation because it could make the isolation level lower (i.e., snapshot). Use it at your own risk only if the consistency does not matter for your transactions.
We do not recommend enabling cross-partition scan with serializable isolation for non-JDBC databases because transactions could be executed with lower isolation (i.e., snapshot). Use it at your own risk only if the consistency does not matter for your transactions.

The expression it could make the isolation level lower sounds a bit unclear to me.

BTW, sorry, I don't fully remember the discussion we had before.
So, we decided to only warn in case users use the cross-partition scan with serializable isolation for backward compatibility instead of throwing a runtime exception, don't we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. Fixed in e3dd00d.

BTW, sorry, I don't fully remember the discussion we had before.
So, we decided to only warn in case users use the cross-partition scan with serializable isolation for backward compatibility instead of throwing a runtime exception, don't we?

At least, we didn't choose to completely disable the cross-partition scan with serializable isolation in v4.0. Disabling it is one idea, but I think it might be useful in some cases regardless of backward compatibility; e.g., users want to basically run transactions in a serializable manner but sometimes run read-only cross-partition scans without changing the setting.

@brfrn169 Do you have any idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thank you! So, for now, we need to enable it in 3.x for backward-compatibility, and we haven't decided to do so in 4.x (we need to think what we should do for 4.x). Is my understanding correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost, yes. My understanding is that we will keep warning it as the same as v3.x unless we make a decision to stop the feature in v4.x explicitly.

##### Execute `Scan` without specifying a partition key to retrieve all the records of a table
##### Execute cross-partition `Scan` without specifying a partition key to retrieve all the records of a table

You can execute a `Scan` operation across all partitions without specifying a partition key by enabling the following property in the ScalarDB configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can execute a `Scan` operation across all partitions without specifying a partition key by enabling the following property in the ScalarDB configuration.
You can execute a `Scan` operation across all partitions, which we call cross-partition scan, without specifying a partition key by enabling the following property in the ScalarDB configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e3dd00d.

{% capture notice--warning %}
**Attention**

Except for JDBC databases, we do not recommend enabling cross-partition scan with serializable isolation because it could make the isolation level lower (i.e., snapshot). Use it at your own risk only if the consistency does not matter for your transactions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e3dd00d.

docs/schema-loader-import.md Show resolved Hide resolved
Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM. Left a couple of comments. Please take a look when you have time!

```java
// Import the table "ns.tbl". If the table is already managed by ScalarDB, the target table does not
// exist, or the table does not meet the requirement of ScalarDB table, an exception will be thrown.
admin.importTable("ns", "tbl");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have added Map<String, String> options as 3rd argument to this method:

Suggested change
admin.importTable("ns", "tbl");
admin.importTable("ns", "tbl", options);

And we didn't add the argument for the 3 branch. To avoid diverging this doc between master and 3, so we should probably apply the same change for the 3 branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you! Fixed in e3dd00d.

And we didn't add the argument for the 3 branch. To avoid diverging this doc between master and 3, so we should probably apply the same change for the 3 branch.

I'm OK to use admin.importTable("ns", "tbl", options) even in v3.x, but might be confusing for users. If we can just apply it without any other concerns, I can handle it since it's not a big diverge. Should I or not?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the importTable() method was already introduced in 3.10, but it was treated as an experiment feature. Therefore, I think we can also add the options argument to the 3 branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, right. Got it. Thank you!

// Import tables.
// You can also use a Properties object instead of configFilePath and a serialized-schema JSON
// string instead of schemaFilePath.
SchemaLoader.load(configFilePath, schemaFilePath, tableCreationOptions, createCoordinatorTables);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importTables, right?

Suggested change
SchemaLoader.load(configFilePath, schemaFilePath, tableCreationOptions, createCoordinatorTables);
SchemaLoader.importTables(configFilePath, schemaFilePath, tableCreationOptions);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Fixed in e3dd00d.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing, it looks like the importTables method doesn't receive the createCoordinatorTables argument.

And maybe tableCreationOptions should be renamed to something like tableImportOptions option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry about that. I fixed it throughout the sample in 148d9e2. PTAL!

@jnmt jnmt requested review from brfrn169 and feeblefakie November 29, 2023 03:23
Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@brfrn169 brfrn169 requested a review from komamitsu November 29, 2023 16:46
Copy link
Contributor

@komamitsu komamitsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

{% capture notice--warning %}
**Attention**

We do not recommend enabling the cross-partition scan with serializable isolation for non-JDBC databases because transactions could be executed with lower isolation (i.e., snapshot). Use it at your own risk only if the consistency does not matter for your transactions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We do not recommend enabling the cross-partition scan with serializable isolation for non-JDBC databases because transactions could be executed with lower isolation (i.e., snapshot). Use it at your own risk only if the consistency does not matter for your transactions.
We do not recommend enabling the cross-partition scan with `SERIALIZABLE` isolation level for non-JDBC databases because transactions could be executed with lower isolation (i.e., snapshot). Use it at your own risk only if the consistency does not matter for your transactions.

If you meant scalar.db.consensus_commit.isolation_level, I think it should be capitalized https://scalardb.scalar-labs.com/docs/latest/configurations/#basic-configurations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in de406a2. Thank you for the feedback!

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link
Member

@josh-wong josh-wong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some comments and suggestions. PTAL!

docs/api-guide.md Outdated Show resolved Hide resolved
docs/schema-loader-import.md Outdated Show resolved Hide resolved
docs/api-guide.md Outdated Show resolved Hide resolved
docs/api-guide.md Outdated Show resolved Hide resolved
docs/api-guide.md Outdated Show resolved Hide resolved
docs/api-guide.md Outdated Show resolved Hide resolved
docs/api-guide.md Outdated Show resolved Hide resolved
Co-authored-by: Josh Wong <joshua.wong@scalar-labs.com>
@jnmt jnmt requested a review from josh-wong November 30, 2023 09:32
Copy link
Member

@josh-wong josh-wong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!🙇‍♂️

@brfrn169 brfrn169 merged commit 980a0e2 into master Nov 30, 2023
23 checks passed
@brfrn169 brfrn169 deleted the update-relational-scan-and-import-docs branch November 30, 2023 10:00
feeblefakie pushed a commit that referenced this pull request Nov 30, 2023
Co-authored-by: Josh Wong <joshua.wong@scalar-labs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants