Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql, opt: avoid full scans in mutation queries #137781

Closed
wants to merge 3 commits into from

Conversation

rytaft
Copy link
Collaborator

@rytaft rytaft commented Dec 19, 2024

opt,sql: support hint to avoid full scan

Informs #79683

Release note (sql change): Added support for a new index hint,
AVOID_FULL_SCAN, which will prevent the optimizer from planning a
full scan for the specified table if any other plan is possible. The
hint can be used in the same way as other existing index hints. For
example, SELECT * FROM table_name@{AVOID_FULL_SCAN};. Note that a
full scan of a partial index may still be preferred by the optimizer,
unless AVOID_FULL_SCAN is used in combination with a specific partial
index via FORCE_INDEX=index_name. This hint is similar to NO_FULL_SCAN,
but will not error if a full scan cannot be avoided.

sql,opt: add setting avoid_full_table_scans_in_mutations

Fixes #79683

Release note (sql change): Added a new session setting
avoid_full_table_scans_in_mutations, which when set to true, causes
the optimizer to avoid planning full table scans for mutation queries
if any other plan is possible. It now defaults to true.

opt: remove a stale comment above optbuilder.buildScan

Removed a comment that references a function parameter that no longer
exists.

Release note: None

Copy link

blathers-crl bot commented Dec 19, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rytaft rytaft force-pushed the avoid-full-scans branch 3 times, most recently from 1e19a05 to cff59af Compare December 20, 2024 01:46
Informs cockroachdb#79683

Release note (sql change): Added support for a new index hint,
AVOID_FULL_SCAN, which will prevent the optimizer from planning a
full scan for the specified table if any other plan is possible. The
hint can be used in the same way as other existing index hints. For
example, SELECT * FROM table_name@{AVOID_FULL_SCAN};. Note that a
full scan of a partial index may still be preferred by the optimizer,
unless AVOID_FULL_SCAN is used in combination with a specific partial
index via FORCE_INDEX=index_name. This hint is similar to NO_FULL_SCAN,
but will not error if a full scan cannot be avoided.
@rytaft rytaft force-pushed the avoid-full-scans branch 2 times, most recently from 07c79dc to e599fcd Compare December 20, 2024 02:34
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/exec/execbuilder/testdata/update line 404 at r3 (raw file):

└── • render
    │
    └── • limit

Need to investigate why we lost the top k here


pkg/sql/opt/exec/execbuilder/testdata/unique line 5707 at r3 (raw file):

vectorized: true
·
• root

This seems concerning that we're no longer getting the insert fast path here.

@rytaft rytaft marked this pull request as ready for review December 20, 2024 02:47
@rytaft rytaft requested review from a team as code owners December 20, 2024 02:47
@rytaft rytaft requested review from mw5h and mgartner and removed request for a team December 20, 2024 02:47
Fixes cockroachdb#79683

Release note (sql change): Added a new session setting
avoid_full_table_scans_in_mutations, which when set to true, causes
the optimizer to avoid planning full table scans for mutation queries
if any other plan is possible. It now defaults to true.
Removed a comment that references a function parameter that no longer
exists.

Release note: None
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 20 of 20 files at r1, 87 of 87 files at r4, 87 of 87 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mw5h and @rytaft)


pkg/sql/opt/exec/execbuilder/testdata/unique line 5707 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

This seems concerning that we're no longer getting the insert fast path here.

For some reason the optimizer no longer picks lookup joins into the partial indexes...


pkg/sql/opt/exec/execbuilder/testdata/update line 404 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Need to investigate why we lost the top k here

I think it's similar to the distinct change—the hugeCost/2 has made the cost of everything else irrelevant.


-- commits line 13 at r1:
nit: I find the language about this partial index behavior in the comment in coster.go easier to understand, maybe consider using that for the release note? Or add another sentence to make it ultra clear that a full scan of a partial index on only avoided if the partial index is forced.


pkg/sql/opt/optbuilder/mutation_builder_arbiter.go line 687 at r5 (raw file):

		if h.mb.b.evalCtx.SessionData().AvoidFullTableScansInMutations {
			indexFlags = &tree.IndexFlags{AvoidFullScan: true}
		}

The expression built by tableScope() is never part of a query plan and is never optimized with exploration rules—it is only used to build scalar expressions (partial index predicates) into the metadata. We really shouldn't be building a whole scan expression just to do that, but it was the simplest way to build all the necessary columns into a scope that the partial index predicate may reference (think of virtual computed columns which are a projection over a scan—to support partial index predicates that reference those, we can't simply use the standard table columns). #61298 tracks making this better.

Anyway, the TLDR is that there is no need to use index flags here.


pkg/sql/opt/exec/execbuilder/testdata/cascade line 928 at r4 (raw file):

        │               │ estimated row count: 10
        │               │ distinct on: a
        │               │

This change is odd. I guess the hugeCost/2 has made the plan without the disctint more-or-less the same cost as the plan with the distinct?


pkg/sql/opt/exec/execbuilder/testdata/fk line 671 at r5 (raw file):

EXPLAIN (OPT) UPDATE child SET p = 4
----
distribute

Do you know why this distribute operator was added in some of these tests?


pkg/sql/opt/optbuilder/testdata/delete line 195 at r5 (raw file):

      ├── columns: x:6!null y:7 z:8 crdb_internal_mvcc_timestamp:9 tableoid:10 column11:11
      ├── internal-ordering: -11
      ├── sort

This looks like a regression too.


pkg/sql/opt/xform/testdata/rules/limit line 2775 at r5 (raw file):

                │    │    ├── [/1000000001 - /1999999999]
                │    │    └── [/2000000001 - ]
                │    ├── flags: avoid-full-scan

I'm not really sure the purpose of this test, but this scan over all partitions + index join is a regression.


pkg/sql/opt/xform/testdata/rules/limit line 2848 at r5 (raw file):

                │    │    ├── [/2 - /4]
                │    │    ├── [/6 - /4999999]
                │    │    └── [/5000001 - ]

This is a regression, too, same as above.

Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I'm going to close this PR in favor of https://reviewable.io/reviews/cockroachdb/cockroach/137781. All the comments should be addressed in the new PR.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @mw5h)


-- commits line 13 at r1:

Previously, mgartner (Marcus Gartner) wrote…

nit: I find the language about this partial index behavior in the comment in coster.go easier to understand, maybe consider using that for the release note? Or add another sentence to make it ultra clear that a full scan of a partial index on only avoided if the partial index is forced.

Fixed in https://reviewable.io/reviews/cockroachdb/cockroach/137781


pkg/sql/opt/optbuilder/mutation_builder_arbiter.go line 687 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

The expression built by tableScope() is never part of a query plan and is never optimized with exploration rules—it is only used to build scalar expressions (partial index predicates) into the metadata. We really shouldn't be building a whole scan expression just to do that, but it was the simplest way to build all the necessary columns into a scope that the partial index predicate may reference (think of virtual computed columns which are a projection over a scan—to support partial index predicates that reference those, we can't simply use the standard table columns). #61298 tracks making this better.

Anyway, the TLDR is that there is no need to use index flags here.

Thanks for the info! Fixed in https://reviewable.io/reviews/cockroachdb/cockroach/137781


pkg/sql/opt/exec/execbuilder/testdata/cascade line 928 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This change is odd. I guess the hugeCost/2 has made the plan without the disctint more-or-less the same cost as the plan with the distinct?

Fixed in https://reviewable.io/reviews/cockroachdb/cockroach/137781


pkg/sql/opt/exec/execbuilder/testdata/fk line 671 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Do you know why this distribute operator was added in some of these tests?

Fixed in https://reviewable.io/reviews/cockroachdb/cockroach/137781


pkg/sql/opt/exec/execbuilder/testdata/unique line 5707 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

For some reason the optimizer no longer picks lookup joins into the partial indexes...

Fixed in https://reviewable.io/reviews/cockroachdb/cockroach/137781


pkg/sql/opt/exec/execbuilder/testdata/update line 404 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I think it's similar to the distinct change—the hugeCost/2 has made the cost of everything else irrelevant.

Fixed in https://reviewable.io/reviews/cockroachdb/cockroach/137781


pkg/sql/opt/optbuilder/testdata/delete line 195 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This looks like a regression too.

Fixed in https://reviewable.io/reviews/cockroachdb/cockroach/137781


pkg/sql/opt/xform/testdata/rules/limit line 2775 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I'm not really sure the purpose of this test, but this scan over all partitions + index join is a regression.

This is an unrelated bug, see #137994


pkg/sql/opt/xform/testdata/rules/limit line 2848 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This is a regression, too, same as above.

See #137994

@rytaft
Copy link
Collaborator Author

rytaft commented Dec 26, 2024

Oops, everywhere I said #137781, I meant #137984

@rytaft
Copy link
Collaborator Author

rytaft commented Dec 26, 2024

Closing in favor of #137984

@rytaft rytaft closed this Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: the optimizer cost model should consider contention
3 participants