Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize insert performance by batching #3621

Merged
merged 13 commits into from
Oct 31, 2023

Conversation

princejha95
Copy link
Contributor

@princejha95 princejha95 commented Oct 19, 2023

Description

Closes #3271.

Readiness checklist

  • I added/updated unit tests (and they pass).
  • I added/updated integration/compatibility tests (and they pass).
  • I added/updated comments and checked rendering.
  • I made spot refactorings.
  • I updated user documentation.
  • I ran task all, and it passed.
  • I ensured that PR title is good enough for the changelog.
  • (for maintainers only) I set Reviewers (@FerretDB/core), Milestone (Next), Labels, Project and project's Sprint fields.
  • I marked all done items in this checklist.

@princejha95 princejha95 requested review from AlekSi and a team as code owners October 19, 2023 17:14
@codecov
Copy link

codecov bot commented Oct 19, 2023

Codecov Report

Merging #3621 (4ee8970) into main (ebf46cf) will decrease coverage by 0.10%.
The diff coverage is 89.39%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3621      +/-   ##
==========================================
- Coverage   74.17%   74.07%   -0.10%     
==========================================
  Files         370      372       +2     
  Lines       23564    23626      +62     
==========================================
+ Hits        17478    17502      +24     
- Misses       5059     5093      +34     
- Partials     1027     1031       +4     
Files Coverage Δ
internal/backends/sqlite/collection.go 80.15% <100.00%> (-1.10%) ⬇️
internal/backends/postgresql/collection.go 71.35% <81.81%> (-0.56%) ⬇️
internal/backends/postgresql/insert.go 88.88% <88.88%> (ø)
internal/backends/sqlite/insert.go 88.46% <88.46%> (ø)
internal/handlers/sqlite/msg_insert.go 79.52% <89.83%> (+4.52%) ⬆️

... and 12 files with indirect coverage changes

Flag Coverage Δ
filter-true 70.12% <89.39%> (-0.11%) ⬇️
hana-1 ?
integration 70.12% <89.39%> (-0.11%) ⬇️
mongodb-1 5.28% <0.00%> (-0.02%) ⬇️
postgresql-1 50.94% <56.81%> (+0.20%) ⬆️
postgresql-2 50.21% <44.69%> (-0.01%) ⬇️
postgresql-3 49.13% <55.30%> (+0.03%) ⬆️
sort-false 70.12% <89.39%> (-0.11%) ⬇️
sqlite-1 50.24% <56.06%> (+<0.01%) ⬆️
sqlite-2 49.50% <43.93%> (-0.03%) ⬇️
sqlite-3 48.39% <54.54%> (+0.03%) ⬆️
unit 29.03% <49.24%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Copy link
Member

@AlekSi AlekSi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's start from fixing code to pass integration tests

@mergify
Copy link
Contributor

mergify bot commented Oct 20, 2023

@princejha95 this pull request has merge conflicts.

@mergify mergify bot added the conflict PRs that have merge conflicts label Oct 20, 2023
@chilagrow chilagrow self-assigned this Oct 26, 2023
@mergify mergify bot removed the conflict PRs that have merge conflicts label Oct 27, 2023
@chilagrow chilagrow added the code/enhancement Some user-visible feature could work better label Oct 27, 2023
@chilagrow chilagrow added this to the Next milestone Oct 27, 2023
@chilagrow chilagrow requested review from a team, chilagrow, noisersup and AlekSi October 27, 2023 07:49
@chilagrow chilagrow changed the title Changes to optimise inserts by making them call in batches Optimize insert by batch inserts Oct 27, 2023
@chilagrow chilagrow enabled auto-merge (squash) October 27, 2023 07:56
chilagrow
chilagrow previously approved these changes Oct 27, 2023
Copy link
Member

@chilagrow chilagrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution 🤗, I made changes for CI to pass.

noisersup
noisersup previously approved these changes Oct 27, 2023
Copy link
Member

@noisersup noisersup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@chilagrow chilagrow dismissed stale reviews from noisersup and themself via 7ec7c20 October 30, 2023 01:27
chilagrow
chilagrow previously approved these changes Oct 30, 2023
@AlekSi AlekSi changed the title Optimize insert by batch inserts Optimize insert performance by batching Oct 30, 2023
@AlekSi AlekSi disabled auto-merge October 30, 2023 15:50
@AlekSi AlekSi enabled auto-merge (squash) October 30, 2023 15:50

if params.Ordered {
break
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's simplify that with

if err = doc.ValidateData(); err == nil {
  docs = append(docs, doc)
  docsIndexes = append(docsIndexes, int32(i))
  continue
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let's do 🙏

Comment on lines 116 to 120
var j int
for i := 0; i < len(params.Docs); i += batchSize {
if j += batchSize; j > len(params.Docs) {
j = len(params.Docs)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that could be simplifying by using a slice instead of a second index j:

docs := params.Docs

// ...

i := min(batchSize, len(docs))
batch, docs := docs[:i], docs[i:]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks it's easier to read too

metadata.DefaultColumn,
)
args = append(args, doc.RecordID())
q, args, err := prepareInsertStatement(meta.TableName, meta.Settings.CappedSize > 0, params.Docs[i:j])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a method for meta.Settings.CappedSize > 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was already added by another PR, using it now 👍

Comment on lines 95 to 98
for {
i, d, err := docsIter.Next()
if errors.Is(err, iterator.ErrIteratorDone) {
if done {
break
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for !done {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@chilagrow chilagrow requested review from AlekSi and chilagrow October 31, 2023 04:09
@AlekSi
Copy link
Member

AlekSi commented Oct 31, 2023

task: [bench-postgresql] ../bin/benchstat old-postgresql.txt new-postgresql.txt
goos: darwin
goarch: arm64
pkg: github.com/FerretDB/FerretDB/integration
                                                        │ old-postgresql.txt │          new-postgresql.txt          │
                                                        │       sec/op       │    sec/op     vs base                │
InsertMany/SmallDocuments/Docs1000/7dc4/Batch1-10               548.6m ±  2%   595.5m ±  2%   +8.55% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch10-10             467.63m ±  3%   73.34m ± 11%  -84.32% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch100-10            456.81m ±  1%   30.02m ±  5%  -93.43% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch1000-10           454.93m ± 38%   17.26m ±  7%  -96.21% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch1-10             5.248 ±  4%    5.083 ± 21%        ~ (p=0.481 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch10-10          1569.5m ±  2%   958.8m ±  5%  -38.91% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch100-10         1437.0m ±  2%   771.6m ±  6%  -46.31% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch1000-10        1646.1m ± 10%   723.0m ±  1%  -56.08% (p=0.000 n=10)
geomean                                                          1.005         297.6m        -70.38%

                                                        │ old-postgresql.txt │          new-postgresql.txt          │
                                                        │        B/op        │     B/op      vs base                │
InsertMany/SmallDocuments/Docs1000/7dc4/Batch1-10               68.38Mi ± 0%   76.23Mi ± 0%  +11.48% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch10-10              22.28Mi ± 0%   16.20Mi ± 0%  -27.29% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch100-10             17.74Mi ± 0%   10.88Mi ± 0%  -38.65% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch1000-10            17.41Mi ± 0%   10.44Mi ± 0%  -40.07% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch1-10            590.9Mi ± 0%   597.7Mi ± 0%   +1.15% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch10-10           639.6Mi ± 0%   785.6Mi ± 0%  +22.82% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch100-10          677.3Mi ± 0%   885.4Mi ± 0%  +30.73% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch1000-10         668.3Mi ± 0%   876.8Mi ± 0%  +31.20% (p=0.000 n=10)
geomean                                                         129.8Mi        122.6Mi        -5.52%

                                                        │ old-postgresql.txt │         new-postgresql.txt          │
                                                        │     allocs/op      │  allocs/op   vs base                │
InsertMany/SmallDocuments/Docs1000/7dc4/Batch1-10                698.7k ± 0%   702.2k ± 0%   +0.49% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch10-10               300.1k ± 0%   152.3k ± 0%  -49.24% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch100-10             259.70k ± 0%   95.57k ± 0%  -63.20% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch1000-10            256.43k ± 0%   91.13k ± 0%  -64.46% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch1-10             2.980M ± 0%   2.981M ± 0%        ~ (p=0.393 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch10-10            2.570M ± 0%   2.415M ± 0%   -6.05% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch100-10           2.521M ± 0%   2.351M ± 0%   -6.73% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch1000-10          2.515M ± 0%   2.346M ± 0%   -6.74% (p=0.000 n=10)
geomean                                                          952.6k        662.2k       -30.48%

@AlekSi AlekSi disabled auto-merge October 31, 2023 15:16
@AlekSi AlekSi merged commit ba384d9 into FerretDB:main Oct 31, 2023
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code/enhancement Some user-visible feature could work better
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Cleanup and optimize inserts
4 participants