Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To-singleton coercion of SQL-style SELECT subqueries #1012

Merged
merged 7 commits into from
Mar 27, 2023
Merged

Conversation

vgapeyev
Copy link
Contributor

@vgapeyev vgapeyev commented Mar 16, 2023

Relevant Issues

Description

Resolves #826, implementing to-singleton subquery coercion for SQL contexts of SQL-style SELECTs, following what is outlined in Ch 9 of the PartiQL specification.

Natural follow-ups would be #1011 and #42.

The implementation has two parts:

  • New built-in function COLL_TO_SCALAR (defined in the spec), which is used to wrap SELECTs eligible for the coercion.
  • New visitor SubqueryCoercionVisitorTransform that detects eligible SELECTs and applies COLL_TO_SCALAR to them.

Behavior of COLL_TO_SCALAR (returning MISSING vs an error when the coercion cannot be performed) depends on compilation options, particularly on TypingMode. This dependency is new, compared to other built-ins. Unfortunately, the accommodation for supporting this is not yet as generic as it should be, since our two compilers use different sets of options.

An alternative to COLL_TO_SCALAR would be emitting the code corresponding to its body somewhere during compilation. This would have avoided the need of making compilation options a parameter to built-ins, but it seems it would require two entirely different implementations in EvaluatingCompiler and the "planning compiler." Also, the approach with the explicit COLL_TO_SCALAR is closer to the specification -- a worthwhile thing in the reference implementation. It could be worthwhile considering this for other functions defined in the spec, such as TUPLEUNION.

The implementation of the visitor covers the contexts explicitly mentioned in the specification, but there are a few more worth considering separately -- #42.

There are also a couple unrelated touch-ups of the underlying codebase, in two first commits.

Other Information

  • Updated Unreleased Section in CHANGELOG: [YES]

  • Any backward-incompatible changes? [YES]

    • Since subqueries were allowed, but not coersed, their behavior was different.
  • Any new external dependencies? [NO]

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@am357 am357 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

Left some comment; can we also use the PR template, I guess it clarifies things like backward compatibility easier.

Copy link
Contributor Author

@vgapeyev vgapeyev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arash, thanks for reviewing!

I believe I responded to everything -- either here or in the catch-up commit.

@codecov-commenter
Copy link

codecov-commenter commented Mar 23, 2023

Codecov Report

Patch coverage: 97.74% and project coverage change: +0.16 🎉

Comparison is base (46219fe) 77.27% compared to head (7e90a2d) 77.44%.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1012      +/-   ##
============================================
+ Coverage     77.27%   77.44%   +0.16%     
- Complexity     1991     2056      +65     
============================================
  Files           235      237       +2     
  Lines         16557    16679     +122     
  Branches       2909     2970      +61     
============================================
+ Hits          12795    12917     +122     
  Misses         2923     2923              
  Partials        839      839              
Flag Coverage Δ
CLI 14.15% <ø> (ø)
EXAMPLES 80.67% <ø> (ø)
LANG 82.99% <97.74%> (+0.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...g/src/main/kotlin/org/partiql/lang/SqlException.kt 75.00% <0.00%> (ø)
...n/org/partiql/lang/errors/ErrorAndErrorContexts.kt 79.64% <ø> (-0.18%) ⬇️
.../eval/visitors/SubqueryCoercionVisitorTransform.kt 97.75% <97.75%> (ø)
...c/main/kotlin/org/partiql/lang/CompilerPipeline.kt 79.12% <100.00%> (+0.46%) ⬆️
...rg/partiql/lang/compiler/PartiQLCompilerBuilder.kt 92.85% <100.00%> (+0.12%) ⬆️
...c/main/kotlin/org/partiql/lang/errors/ErrorCode.kt 87.23% <100.00%> (+0.06%) ⬆️
...main/kotlin/org/partiql/lang/eval/ErrorSignaler.kt 69.69% <100.00%> (+6.06%) ⬆️
...partiql/lang/eval/builtins/DefinitionalBuiltins.kt 100.00% <100.00%> (ø)
...rg/partiql/lang/eval/visitors/VisitorTransforms.kt 91.66% <100.00%> (+0.75%) ⬆️
.../org/partiql/lang/planner/PartiQLPlannerDefault.kt 90.00% <100.00%> (+0.16%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@am357 am357 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comments around the CHANGELOG entry and JavaDoc no major issues.
Thanks!

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Comment on lines 11 to 13
* The implementation deals with this by using the visitor to find possible _contexts_ of possible SELECT subqueries
* (rather than the subqueries themselves) and then inspecting each context to find the subquery
* and coerce it, if eligible. (In this problem, a "context" is an AST node that can contain an eligible subquery.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the implementation details is here likely for better readability but I generally believe adding the how in the JavaDoc of the class can result in discrepancies between the class documentation and the logic, if the logic changes. I'd personally rather know the what w/some usage examples here and leave the how either as a commend in the code or removed altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. Moving this to a non-doc comment.

Comment on lines +15 to +109
TransformTestCase(
name = "Bare SELECT is not coerced",
original = """ SELECT max(n) FROM [1,2,3] AS n """,
expected = """ SELECT max(n) FROM [1,2,3] AS n """
),
TransformTestCase(
name = "Coerce subquery in an arithmetic rhs",
original = """ 5 + (SELECT max(n) FROM <<1,2,3>> AS n) """,
expected = """ 5 + coll_to_scalar(SELECT max(n) FROM <<1,2,3>> AS n) """
),
TransformTestCase(
name = "Coerce subquery in an arithmetic lhs",
original = """ (SELECT n FROM <<1, 2, 3>> AS n) * 4 """,
expected = """ coll_to_scalar(SELECT n FROM <<1, 2, 3>> AS n) * 4 """
),
TransformTestCase(
name = "Coerce in scalar comparison lhs",
original = """ x < (SELECT avg(n) FROM <<1,2,3>> AS n) """,
expected = """ x < coll_to_scalar(SELECT avg(n) FROM <<1,2,3>> AS n)"""
),
TransformTestCase(
name = "Coerce in scalar comparison rhs",
original = """ (SELECT n FROM [1,2,3] AS n) = 2 """,
expected = """ coll_to_scalar(SELECT n FROM [1,2,3] AS n) = 2 """
),
TransformTestCase(
name = "Coerce on both sides of a scalar comparison",
original = """ (SELECT n FROM [1,2,3] AS n) <> (SELECT avg(n) FROM <<1,2,3>> AS n) """,
expected = """ coll_to_scalar(SELECT n FROM [1,2,3] AS n) <> coll_to_scalar(SELECT avg(n) FROM <<1,2,3>> AS n) """
),
TransformTestCase(
name = "Do not coerce in lhs of array comparison",
original = """ (SELECT n FROM [1,2,3] AS n) = [3, 2, 1] """,
expected = """ (SELECT n FROM [1,2,3] AS n) = [3, 2, 1] """
),
TransformTestCase(
name = "Do not coerce in rhs of array comparison",
original = """ (1, 2, 3) < (SELECT n FROM [1,2,3] AS n) """,
expected = """ (1, 2, 3) < (SELECT n FROM [1,2,3] AS n)"""
),
TransformTestCase(
name = "Coerce lhs of IN, but not rhs",
original = """ (SELECT max(n) FROM <<1,2,3>> AS n) IN (SELECT n FROM [1,2,3] AS n) """,
expected = """ coll_to_scalar(SELECT max(n) FROM <<1,2,3>> AS n) IN (SELECT n FROM [1,2,3] AS n) """
),
TransformTestCase(
name = "Coerce subquery that is the WHERE expression",
original = """
SELECT x.b
FROM << {'a': 1, 'b': 10}, {'a': 3, 'b': 30} >> AS x
WHERE (SELECT max(n) = x.a FROM [1,2,3] AS n)
""".trimIndent(),
expected = """
SELECT x.b
FROM << {'a': 1, 'b': 10}, {'a': 3, 'b': 30} >> AS x
WHERE coll_to_scalar(SELECT max(n) = x.a FROM [1,2,3] AS n)
""".trimIndent()
),
TransformTestCase(
name = "Coerce SELECT when the only item in another SELECT",
original = """ SELECT (SELECT max(n) FROM [1,2,3] AS n) AS m FROM 0 """,
expected = """ SELECT coll_to_scalar(SELECT max(n) FROM [1,2,3] AS n) AS m FROM 0 """
),
TransformTestCase(
name = "Coerce SELECT when among items in another SELECT",
original = """
SELECT x.b,
(SELECT y.c
FROM << {'a': 1, 'c': 100}, {'a': 3, 'c': 300} >> AS y
WHERE y.a = x.a ) AS c2
FROM << {'a': 1, 'b': 10}, {'a': 3, 'b': 30} >> AS x
""".trimIndent(),
expected = """
SELECT x.b,
coll_to_scalar(
SELECT y.c
FROM << {'a': 1, 'c': 100}, {'a': 3, 'c': 300} >> AS y
WHERE y.a = x.a ) AS c2
FROM << {'a': 1, 'b': 10}, {'a': 3, 'b': 30} >> AS x
""".trimIndent()
),
TransformTestCase(
name = "Do not coerce data sources in FROM",
original = """
SELECT n2.n1, m2.m
FROM (SELECT n1 FROM <<1,2,3>> AS n1) AS n2,
(SELECT min(m1) AS m FROM [4,5] AS m1) AS m2
WHERE 2 * n2.n1 = m2.m
""".trimIndent(),
expected = """
SELECT n2.n1, m2.m
FROM (SELECT n1 FROM <<1,2,3>> AS n1) AS n2,
(SELECT min(m1) AS m FROM [4,5] AS m1) AS m2
WHERE 2 * n2.n1 = m2.m
""".trimIndent()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Co-authored-by: Arash Maymandi <27716912+am357@users.noreply.github.com>
@github-actions
Copy link

Conformance comparison report

Base (c4f061b) 4003f6c +/-
% Passing 97.40% 97.42% 0.02%
✅ Passing 4270 4271 1
❌ Failing 114 113 -1
🔶 Ignored 0 0 0
Total Tests 4384 4384 0

Number passing in both: 4270

Number failing in both: 113

Number passing in Base (c4f061b) but now fail: 0

Number failing in Base (c4f061b) but now pass: 1

The following test(s) were previously failing but now pass. Before merging, confirm they are intended to pass:

Click here to see
  • inner select evaluating to collection with more than one element, compileOption: PERMISSIVE

@vgapeyev vgapeyev merged commit 0a7c889 into main Mar 27, 2023
@vgapeyev vgapeyev deleted the subquery-coercion branch March 27, 2023 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Coerce SELECT Clause Which Returns A Single Item Into A Scalar
3 participants