Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #36088 -- Avoided unnecessary DEFAULT usage on bulk_create(). #18909

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

charettes
Copy link
Member

@charettes charettes commented Dec 10, 2024

Trac ticket number

ticket-36088

Branch description

This is still a draft that builds on top of #18847 to demonstrate the benefit of avoid to use DEFAULT when unneessary.

For example, given the following model

class Article(models.model):
   created_at = models.DateTimeField(db_default=Now())
   title = models.CharField()

Then

Article.objects.bulk_create([Article(title="foo"), ..., Article(title="bar")])

will result in the following SQL

INSERT INTO article (created_at, title) VALUES (DEFAULT, %s), ..., (DEFAULT, %s)

while DEFAULT could be entirely omitted since well, it's the database default?

INSERT INTO article (title) VALUES (%s), ..., (%s)

I discovered this case when testing the UNNSET strategy as unfortunately the unnecessary usage of DEFAULT disables the optimization.

@alexandernst
Copy link

Just out of technical curiosity: how much of a performance gain can we expect from this? I imagine it depends on the number of inserted models, but maybe you ran tests with a fairly large sample dataset?

@charettes
Copy link
Member Author

charettes commented Dec 10, 2024

@alexandernst this PR is an extension of #18847 that allows the optimization implemented over there to be enabled even when the model includes fields making use of db_default

I haven't tested how much of a speed up not specifying DEFAULT when unnecessary (when an INSERT field value is only DEFAULT) but I suspect most of the benefits will be from query generation speed perspective. The main benefits here are that it reduces the amount of parameter bindings for Oracle and allows the UNNEST approach to be used on Postgres when it wasn't the case before.

As for the UNNEST speed up itself you can find more about it on the forum. There were reports of 25% speed up for 1k rows with ten columns but I assume that mileage will vary depending on the number of indices that you have on the table and other schema specificities.

@LilyFoote
Copy link
Contributor

LilyFoote commented Jan 9, 2025

It might make sense to split out the commit adding has_db_default to a separate PR. That would de-clutter this PR and make that available in #19008.

@charettes
Copy link
Member Author

Good idea @LilyFoote; I'll work on that later today.

@charettes charettes force-pushed the insert-db-default-elide branch from b7dcafb to 73934aa Compare January 12, 2025 06:25
@charettes charettes changed the title Avoided unnecessary DEFAULT usage on bulk_create(). Fixed #36088 -- Avoided unnecessary DEFAULT usage on bulk_create(). Jan 12, 2025
@charettes charettes marked this pull request as ready for review January 12, 2025 06:26
Comment on lines 1838 to 1851
elif not supports_default_keyword_in_bulk_insert:
# If the field cannot be excluded from the INSERT for the
# reasons listed above and the backend doesn't support the
# DEFAULT keyword each values must be expanded into their
# underlying expressions.
prepared_db_default = field_prepare(field.db_default)
field_values = [
(
prepared_db_default
if isinstance(value, DatabaseDefault)
else value
)
for value in field_values
]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch is already covered by tests and has simply be moved over from _prepare_for_bulk_create.

@charettes charettes force-pushed the insert-db-default-elide branch from 73934aa to 4e4f400 Compare January 12, 2025 07:15
Comment on lines 1843 to 1846
prepared_db_default = field_prepare(field.db_default)
field_values = [
(
prepared_db_default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prepared_db_default = field_prepare(field.db_default)
field_values = [
(
prepared_db_default
field_values = [
(
field_prepare(field.db_default)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done explicitly. field.db_default is the same for all instances and field_prepare is not cheap so it's better to prepare it once and use the same value than prepare it N times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha!

django/db/models/sql/compiler.py Show resolved Hide resolved
Comment on lines 1828 to 1851
if field.has_db_default():
# If all values are DEFAULT don't include the field and its
# values in the query as they are redundant and could prevent
# optimizations. This cannot be done if we're dealing with the
# last field as INSERT statements require at least one.
if len(fields) > 1 and all(
isinstance(value, DatabaseDefault) for value in field_values
):
fields.remove(field)
continue
elif not supports_default_keyword_in_bulk_insert:
# If the field cannot be excluded from the INSERT for the
# reasons listed above and the backend doesn't support the
# DEFAULT keyword each values must be expanded into their
# underlying expressions.
prepared_db_default = field_prepare(field.db_default)
field_values = [
(
prepared_db_default
if isinstance(value, DatabaseDefault)
else value
)
for value in field_values
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be easier to read if we negate some checks and explicitly continue:

Suggested change
if field.has_db_default():
# If all values are DEFAULT don't include the field and its
# values in the query as they are redundant and could prevent
# optimizations. This cannot be done if we're dealing with the
# last field as INSERT statements require at least one.
if len(fields) > 1 and all(
isinstance(value, DatabaseDefault) for value in field_values
):
fields.remove(field)
continue
elif not supports_default_keyword_in_bulk_insert:
# If the field cannot be excluded from the INSERT for the
# reasons listed above and the backend doesn't support the
# DEFAULT keyword each values must be expanded into their
# underlying expressions.
prepared_db_default = field_prepare(field.db_default)
field_values = [
(
prepared_db_default
if isinstance(value, DatabaseDefault)
else value
)
for value in field_values
]
if not field.has_db_default():
value_cols.append(field_values)
continue
# If all values are DEFAULT don't include the field and its
# values in the query as they are redundant and could prevent
# optimizations. This cannot be done if we're dealing with the
# last field as INSERT statements require at least one.
if len(fields) > 1 and all(
isinstance(value, DatabaseDefault) for value in field_values
):
fields.remove(field)
continue
if supports_default_keyword_in_bulk_insert:
value_cols.append(field_values)
continue
# If the field cannot be excluded from the INSERT for the
# reasons listed above and the backend doesn't support the
# DEFAULT keyword each values must be expanded into their
# underlying expressions.
prepared_db_default = field_prepare(field.db_default)
field_values = [
(
prepared_db_default
if isinstance(value, DatabaseDefault)
else value
)
for value in field_values
]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

if len(fields) > 1 and all(
isinstance(value, DatabaseDefault) for value in field_values
):
fields.remove(field)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is mutating self.query.fields here safe?

Copy link
Member Author

@charettes charettes Jan 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. In practice InsertQuery is terminal and not chained but it's just bad practice to mutate an object like that.

@charettes charettes force-pushed the insert-db-default-elide branch from 4e4f400 to bb42041 Compare January 12, 2025 16:03
@charettes
Copy link
Member Author

Thanks for the review @LilyFoote. All comments should be addressed.

@charettes
Copy link
Member Author

Oracle failures are unrelated and are due to issues in #18994 fixed by #19035.

When all values of a field with a db_default are DatabaseDefault, which is the
case most of the time, there is no point in specifying explicit DEFAULT for all
INSERT VALUES as that's what the database will do anyway if not specified.

In the case of Postgresql doing so can even be harmful as it prevents the usage
of the UNNEST strategy and in the case of Oracle, which doesn't support the
usage of the DEFAULT keyword, it unnecessarily requires providing literal
db defaults.

Thanks Lily Foote for the review.
@sarahboyce sarahboyce force-pushed the insert-db-default-elide branch from bb42041 to b9d920e Compare January 13, 2025 11:56
@sarahboyce
Copy link
Contributor

buildbot, test on oracle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants