New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Fixed #36088 -- Avoided unnecessary DEFAULT usage on bulk_create(). #18909

Open

charettes wants to merge 1 commit into django:main from charettes:insert-db-default-elide

+91 −22

Member

charettes commented Dec 10, 2024 •

edited by sarahboyce

Loading

Trac ticket number

Branch description

This is still a draft that builds on top of #18847 to demonstrate the benefit of avoid to use DEFAULT when unneessary.

For example, given the following model

class Article(models.model):
   created_at = models.DateTimeField(db_default=Now())
   title = models.CharField()

Then

Article.objects.bulk_create([Article(title="foo"), ..., Article(title="bar")])

will result in the following SQL

INSERT INTO article (created_at, title) VALUES (DEFAULT, %s), ..., (DEFAULT, %s)

while DEFAULT could be entirely omitted since well, it's the database default?

INSERT INTO article (title) VALUES (%s), ..., (%s)

I discovered this case when testing the UNNSET strategy as unfortunately the unnecessary usage of DEFAULT disables the optimization.

charettes mentioned this pull request

Fixed #35936 -- Used unnest for bulk inserts on Postgres when possible. #18847

Merged

charettes force-pushed the insert-db-default-elide branch 4 times, most recently from e645795 to 9861770 Compare

December 10, 2024 03:36

felixxm self-requested a review

December 10, 2024 07:00

alexandernst commented Dec 10, 2024

Just out of technical curiosity: how much of a performance gain can we expect from this? I imagine it depends on the number of inserted models, but maybe you ran tests with a fairly large sample dataset?

Member Author

charettes commented Dec 10, 2024 •

edited

Loading

@alexandernst this PR is an extension of #18847 that allows the optimization implemented over there to be enabled even when the model includes fields making use of db_default

I haven't tested how much of a speed up not specifying DEFAULT when unnecessary (when an INSERT field value is only DEFAULT) but I suspect most of the benefits will be from query generation speed perspective. The main benefits here are that it reduces the amount of parameter bindings for Oracle and allows the UNNEST approach to be used on Postgres when it wasn't the case before.

As for the UNNEST speed up itself you can find more about it on the forum. There were reports of 25% speed up for 1k rows with ten columns but I assume that mileage will vary depending on the number of indices that you have on the table and other schema specificities.

Contributor

LilyFoote commented Jan 9, 2025 •

edited

Loading

It might make sense to split out the commit adding has_db_default to a separate PR. That would de-clutter this PR and make that available in #19008.

Member Author

charettes commented Jan 9, 2025

Good idea @LilyFoote; I'll work on that later today.

charettes force-pushed the insert-db-default-elide branch from 9861770 to 8303ded Compare

January 10, 2025 05:25

charettes mentioned this pull request

Refs #36064 -- Added Model.has_db_default() to encapsulate NOT_PROVIDED checks. #19021

Merged

charettes force-pushed the insert-db-default-elide branch from 8303ded to b7dcafb Compare

January 11, 2025 04:00

charettes commented

View reviewed changes

django/db/models/sql/compiler.py Show resolved Hide resolved

charettes mentioned this pull request

Fixed #36086 -- Fixed crash when using GeneratedField with non-AutoField pk. #19024

Merged

charettes force-pushed the insert-db-default-elide branch from b7dcafb to 73934aa Compare

January 12, 2025 06:25

charettes changed the title ~~Avoided unnecessary DEFAULT usage on bulk_create().~~ Fixed #36088 -- Avoided unnecessary DEFAULT usage on bulk_create().

charettes marked this pull request as ready for review

January 12, 2025 06:26

charettes commented

View reviewed changes

django/db/models/sql/compiler.py Outdated Show resolved Hide resolved

charettes commented

View reviewed changes

django/db/models/sql/compiler.py Outdated

Comment on lines 1838 to 1851

+                                  elif not supports_default_keyword_in_bulk_insert:
+                                      # If the field cannot be excluded from the INSERT for the
+                                      # reasons listed above and the backend doesn't support the
+                                      # DEFAULT keyword each values must be expanded into their
+                                      # underlying expressions.
+                                      prepared_db_default = field_prepare(field.db_default)
+                                      field_values = [
+                                          (
+                                              prepared_db_default
+                                              if isinstance(value, DatabaseDefault)
+                                              else value
+                                          )
+                                          for value in field_values
+                                      ]

Member Author

charettes Jan 12, 2025

This branch is already covered by tests and has simply be moved over from _prepare_for_bulk_create.

charettes force-pushed the insert-db-default-elide branch from 73934aa to 4e4f400 Compare

January 12, 2025 07:15

LilyFoote reviewed

View reviewed changes

django/db/models/sql/compiler.py Outdated

Comment on lines 1843 to 1846

+                                      prepared_db_default = field_prepare(field.db_default)
+                                      field_values = [
+                                          (
+                                              prepared_db_default

Contributor

LilyFoote Jan 12, 2025

Suggested change

      
                                    prepared_db_default = field_prepare(field.db_default)
          
                                    field_values = [
          
                                        (
          
                                            prepared_db_default
          
                                    field_values = [
          
                                        (
          
                                            field_prepare(field.db_default)

Member Author

charettes Jan 12, 2025

This is done explicitly. field.db_default is the same for all instances and field_prepare is not cheap so it's better to prepare it once and use the same value than prepare it N times.

Contributor

LilyFoote Jan 12, 2025

Ah, gotcha!

django/db/models/sql/compiler.py Show resolved Hide resolved

django/db/models/sql/compiler.py Outdated

Comment on lines 1828 to 1851

+                              if field.has_db_default():
+                                  # If all values are DEFAULT don't include the field and its
+                                  # values in the query as they are redundant and could prevent
+                                  # optimizations. This cannot be done if we're dealing with the
+                                  # last field as INSERT statements require at least one.
+                                  if len(fields) > 1 and all(
+                                      isinstance(value, DatabaseDefault) for value in field_values
+                                  ):
+                                      fields.remove(field)
+                                      continue
+                                  elif not supports_default_keyword_in_bulk_insert:
+                                      # If the field cannot be excluded from the INSERT for the
+                                      # reasons listed above and the backend doesn't support the
+                                      # DEFAULT keyword each values must be expanded into their
+                                      # underlying expressions.
+                                      prepared_db_default = field_prepare(field.db_default)
+                                      field_values = [
+                                          (
+                                              prepared_db_default
+                                              if isinstance(value, DatabaseDefault)
+                                              else value
+                                          )
+                                          for value in field_values
+                                      ]

Contributor

LilyFoote Jan 12, 2025

It might be easier to read if we negate some checks and explicitly continue:

Suggested change

      
                            if field.has_db_default():
          
                                # If all values are DEFAULT don't include the field and its
          
                                # values in the query as they are redundant and could prevent
          
                                # optimizations. This cannot be done if we're dealing with the
          
                                # last field as INSERT statements require at least one.
          
                                if len(fields) > 1 and all(
          
                                    isinstance(value, DatabaseDefault) for value in field_values
          
                                ):
          
                                    fields.remove(field)
          
                                    continue
          
                                elif not supports_default_keyword_in_bulk_insert:
          
                                    # If the field cannot be excluded from the INSERT for the
          
                                    # reasons listed above and the backend doesn't support the
          
                                    # DEFAULT keyword each values must be expanded into their
          
                                    # underlying expressions.
          
                                    prepared_db_default = field_prepare(field.db_default)
          
                                    field_values = [
          
                                        (
          
                                            prepared_db_default
          
                                            if isinstance(value, DatabaseDefault)
          
                                            else value
          
                                        )
          
                                        for value in field_values
          
                                    ]
          
                            if not field.has_db_default():
          
                                value_cols.append(field_values)
          
                                continue
          
                            # If all values are DEFAULT don't include the field and its
          
                            # values in the query as they are redundant and could prevent
          
                            # optimizations. This cannot be done if we're dealing with the
          
                            # last field as INSERT statements require at least one.
          
                            if len(fields) > 1 and all(
          
                                isinstance(value, DatabaseDefault) for value in field_values
          
                            ):
          
                                fields.remove(field)
          
                                continue
          
                            if supports_default_keyword_in_bulk_insert:
          
                                value_cols.append(field_values)
          
                                continue
          
                            # If the field cannot be excluded from the INSERT for the
          
                            # reasons listed above and the backend doesn't support the
          
                            # DEFAULT keyword each values must be expanded into their
          
                            # underlying expressions.
          
                            prepared_db_default = field_prepare(field.db_default)
          
                            field_values = [
          
                                (
          
                                    prepared_db_default
          
                                    if isinstance(value, DatabaseDefault)
          
                                    else value
          
                                )
          
                                for value in field_values
          
                            ]

Member Author

charettes Jan 12, 2025

I like it!

django/db/models/sql/compiler.py Outdated

+                                  if len(fields) > 1 and all(
+                                      isinstance(value, DatabaseDefault) for value in field_values
+                                  ):
+                                      fields.remove(field)

Contributor

LilyFoote Jan 12, 2025

Is mutating self.query.fields here safe?

Member Author

charettes Jan 12, 2025 •

edited

Loading

Good catch. In practice InsertQuery is terminal and not chained but it's just bad practice to mutate an object like that.

charettes force-pushed the insert-db-default-elide branch from 4e4f400 to bb42041 Compare

January 12, 2025 16:03

Member Author

charettes commented Jan 12, 2025

Thanks for the review @LilyFoote. All comments should be addressed.

LilyFoote approved these changes

View reviewed changes

Member Author

charettes commented Jan 12, 2025

Oracle failures are unrelated and are due to issues in #18994 fixed by #19035.

charettes mentioned this pull request

Fixed #36050 -- Added OuterRef support to CompositePrimaryKey. #18994

Merged

6 tasks


          Fixed #36088 -- Avoided unnecessary DEFAULT usage on bulk_create().

b9d920e

When all values of a field with a db_default are DatabaseDefault, which is the
case most of the time, there is no point in specifying explicit DEFAULT for all
INSERT VALUES as that's what the database will do anyway if not specified.

In the case of Postgresql doing so can even be harmful as it prevents the usage
of the UNNEST strategy and in the case of Oracle, which doesn't support the
usage of the DEFAULT keyword, it unnecessarily requires providing literal
db defaults.

Thanks Lily Foote for the review.

sarahboyce force-pushed the insert-db-default-elide branch from bb42041 to b9d920e Compare

January 13, 2025 11:56

Contributor

sarahboyce commented Jan 13, 2025

buildbot, test on oracle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet