Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql, opt: avoid full scans in mutation queries with cost flags #137984

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

rytaft
Copy link
Collaborator

@rytaft rytaft commented Dec 25, 2024

opt: convert memo.Cost to a struct with CostFlags

This is a mechanical change that will set up the ability to use
CostFlags in a future commit.

Release note: None

opt,sql: support hint to avoid full scan

Added a hint to avoid full scans (see release note below for details).
To support this change, added a field to memo.Cost with a new type
memo.CostFlags, which contains a number of boolean flags and supports
"multi-dimensional costing". This allows the optimizer to compare plans
based on the flags set in addition to the single-dimensional float64
cost. For example, plans with the new FullScanPenalty cost flag enabled
will always be more expensive than plans without any cost flags, even
if the base float64 cost is lower.

The new CostFlags type also includes a flag for HugeCostPenalty, which
must be set for plans with hugeCost. This ensures that existing
hints that use hugeCost still work if some other cost flags are set,
since HugeCostPenalty takes precedence over other cost flags.

This new CostFlags field is needed to support hints that do not cause an
error if the optimizer cannot find a plan complying with the hint. This
is needed because the previous approach of simply using hugeCost to
avoid certain plans meant that if such plans were unavoidable, we could
not effectively compare plans with cost greater than hugeCost due to
loss of floating point precision.

Informs #79683

Release note (sql change): Added support for a new index hint,
AVOID_FULL_SCAN, which will prevent the optimizer from planning a
full scan for the specified table if any other plan is possible. The
hint can be used in the same way as other existing index hints. For
example, SELECT * FROM table_name@{AVOID_FULL_SCAN};. This hint is
similar to NO_FULL_SCAN, but will not error if a full scan cannot be
avoided. Note that normally a full scan of a partial index would not
be considered a "full scan" for the purposes of the NO_FULL_SCAN and
AVOID_FULL_SCAN hints, but if the user has explicitly forced the
partial index via FORCE_INDEX=index_name, we do consider it a full
scan.

sql,opt: add setting avoid_full_table_scans_in_mutations

Fixes #79683

Release note (sql change): Added a new session setting
avoid_full_table_scans_in_mutations, which when set to true, causes
the optimizer to avoid planning full table scans for mutation queries
if any other plan is possible. It now defaults to true.

opt: remove a stale comment above optbuilder.buildScan

Removed a comment that references a function parameter that no longer
exists.

Release note: None

Copy link

blathers-crl bot commented Dec 25, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rytaft rytaft force-pushed the avoid-full-scans-flags branch 5 times, most recently from d55cf61 to f0b63a5 Compare December 26, 2024 15:21
@rytaft
Copy link
Collaborator Author

rytaft commented Dec 26, 2024

pkg/sql/opt/xform/testdata/rules/limit line 2770 at r4 (raw file):

                ├── scan t82730a@t82730a_col1_1_col1_3_key
                │    ├── columns: col1_0:7!null col1_1:8 col1_3:9
                │    ├── constraint: /8/9

I'm looking into this.

@rytaft
Copy link
Collaborator Author

rytaft commented Dec 26, 2024

pkg/sql/opt/xform/testdata/rules/limit line 2844 at r4 (raw file):

                ├── scan t82730b@t82730b_col1_1_col1_3_key [as=tab_41831]
                │    ├── columns: col1_0:7!null col1_1:8 col1_3:9
                │    ├── constraint: /8/9

Ditto

Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/xform/testdata/rules/limit line 2770 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I'm looking into this.

This is an unrelated bug: #137994


pkg/sql/opt/xform/testdata/rules/limit line 2844 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Ditto

See #137994

@rytaft rytaft marked this pull request as ready for review December 26, 2024 16:12
@rytaft rytaft requested review from a team as code owners December 26, 2024 16:12
@rytaft rytaft requested review from DrewKimball, mgartner and a team and removed request for a team and DrewKimball December 26, 2024 16:12
@rytaft
Copy link
Collaborator Author

rytaft commented Dec 26, 2024

pkg/sql/opt/memo/cost.go line 14 at r4 (raw file):

// TODO: Need more details about what one "unit" of cost means.
type Cost struct {
	Cost  float64

Happy to bikeshed on names here -- maybe the "Cost" data member should just be "C"?

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

:lgtm:

Reviewed 24 of 24 files at r1, 24 of 24 files at r2, 84 of 84 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/opt/memo/cost.go line 14 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Happy to bikeshed on names here -- maybe the "Cost" data member should just be "C"?

I like C. 👍


pkg/sql/opt/xform/testdata/rules/limit line 2770 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

This is an unrelated bug: #137994

I'm worried this change makes this bad plan more likely to hit, so I'm going to label that bug a GA-blocker for now.


pkg/sql/opt/memo/cost.go line 53 at r2 (raw file):

type CostFlags struct {
	FullScanPenalty bool
	HugeCostPenalty bool

nit: an explanation of each of these might be useful


pkg/sql/opt/xform/coster.go line 193 at r2 (raw file):

	// If the final expression has this cost or larger, it means that there was no
	// plan that could satisfy the hints.
	hugeCost = memo.Cost{Cost: 1e100, Flags: memo.CostFlags{HugeCostPenalty: true}}

Do we still need the 1e100 cost for hugeCost?


pkg/ccl/logictestccl/testdata/logic_test/multi_region_remote_access_error line 505 at r3 (raw file):

# Delete accessing all regions should fail.
retry
statement error pq: Query has no home region\. Try adding a filter on messages_rbr\.crdb_region and/or on key column \(messages_rbr\.message\)\. For more information, see https://www.cockroachlabs.com/docs/stable/cost-based-optimizer.html#control-whether-queries-are-limited-to-a-single-region

Any idea why this changed?

This is a mechanical change that will set up the ability to use
CostFlags in a future commit.

Release note: None
Added a hint to avoid full scans (see release note below for details).
To support this change, added a field to memo.Cost with a new type
memo.CostFlags, which contains a number of boolean flags and supports
"multi-dimensional costing". This allows the optimizer to compare plans
based on the flags set in addition to the single-dimensional float64
cost. For example, plans with the new FullScanPenalty cost flag enabled
will always be more expensive than plans without any cost flags, even
if the base float64 cost is lower.

The new CostFlags type also includes a flag for HugeCostPenalty, which
must be set for plans with "hugeCost". This ensures that existing
hints that use hugeCost still work if some other cost flags are set,
since HugeCostPenalty takes precedence over other cost flags.

This new CostFlags field is needed to support hints that do not cause an
error if the optimizer cannot find a plan complying with the hint. This
is needed because the previous approach of simply using "hugeCost" to
avoid certain plans meant that if such plans were unavoidable, we could
not effectively compare plans with cost greater than hugeCost due to
loss of floating point precision.

Informs cockroachdb#79683

Release note (sql change): Added support for a new index hint,
AVOID_FULL_SCAN, which will prevent the optimizer from planning a
full scan for the specified table if any other plan is possible. The
hint can be used in the same way as other existing index hints. For
example, SELECT * FROM table_name@{AVOID_FULL_SCAN};. This hint is
similar to NO_FULL_SCAN, but will not error if a full scan cannot be
avoided. Note that normally a full scan of a partial index would not
be considered a "full scan" for the purposes of the NO_FULL_SCAN and
AVOID_FULL_SCAN hints, but if the user has explicitly forced the
partial index via FORCE_INDEX=index_name, we do consider it a full
scan.
Fixes cockroachdb#79683

Release note (sql change): Added a new session setting
avoid_full_table_scans_in_mutations, which when set to true, causes
the optimizer to avoid planning full table scans for mutation queries
if any other plan is possible. It now defaults to true.
Removed a comment that references a function parameter that no longer
exists.

Release note: None
@rytaft rytaft force-pushed the avoid-full-scans-flags branch from f0b63a5 to 5fc24c9 Compare December 28, 2024 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: the optimizer cost model should consider contention
3 participants