Optimize `insert` performance by batching #3621

princejha95 · 2023-10-19T17:14:52Z

Description

Closes #3271.

Readiness checklist

I added/updated unit tests (and they pass).
I added/updated integration/compatibility tests (and they pass).
I added/updated comments and checked rendering.
I made spot refactorings.
I updated user documentation.
I ran task all, and it passed.
I ensured that PR title is good enough for the changelog.
(for maintainers only) I set Reviewers (@FerretDB/core), Milestone (Next), Labels, Project and project's Sprint fields.
I marked all done items in this checklist.

codecov · 2023-10-19T17:19:19Z

Codecov Report

Merging #3621 (4ee8970) into main (ebf46cf) will decrease coverage by 0.10%.
The diff coverage is 89.39%.

@@            Coverage Diff             @@
##             main    #3621      +/-   ##
==========================================
- Coverage   74.17%   74.07%   -0.10%     
==========================================
  Files         370      372       +2     
  Lines       23564    23626      +62     
==========================================
+ Hits        17478    17502      +24     
- Misses       5059     5093      +34     
- Partials     1027     1031       +4

Files	Coverage Δ
internal/backends/sqlite/collection.go	`80.15% <100.00%> (-1.10%)`	⬇️
internal/backends/postgresql/collection.go	`71.35% <81.81%> (-0.56%)`	⬇️
internal/backends/postgresql/insert.go	`88.88% <88.88%> (ø)`
internal/backends/sqlite/insert.go	`88.46% <88.46%> (ø)`
internal/handlers/sqlite/msg_insert.go	`79.52% <89.83%> (+4.52%)`	⬆️

... and 12 files with indirect coverage changes

Flag	Coverage Δ
filter-true	`70.12% <89.39%> (-0.11%)`	⬇️
hana-1	`?`
integration	`70.12% <89.39%> (-0.11%)`	⬇️
mongodb-1	`5.28% <0.00%> (-0.02%)`	⬇️
postgresql-1	`50.94% <56.81%> (+0.20%)`	⬆️
postgresql-2	`50.21% <44.69%> (-0.01%)`	⬇️
postgresql-3	`49.13% <55.30%> (+0.03%)`	⬆️
sort-false	`70.12% <89.39%> (-0.11%)`	⬇️
sqlite-1	`50.24% <56.06%> (+<0.01%)`	⬆️
sqlite-2	`49.50% <43.93%> (-0.03%)`	⬇️
sqlite-3	`48.39% <54.54%> (+0.03%)`	⬆️
unit	`29.03% <49.24%> (+0.11%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

AlekSi

Yeah, let's start from fixing code to pass integration tests

mergify · 2023-10-20T17:40:43Z

@princejha95 this pull request has merge conflicts.

chilagrow

Thanks for your contribution 🤗, I made changes for CI to pass.

noisersup

Looks great!

AlekSi · 2023-10-30T16:54:33Z

internal/handlers/sqlite/msg_insert.go


 				if params.Ordered {
 					break
 				}
+			} else {


Let's simplify that with

if err = doc.ValidateData(); err == nil { docs = append(docs, doc) docsIndexes = append(docsIndexes, int32(i)) continue }

Thanks, let's do 🙏

AlekSi · 2023-10-30T17:02:03Z

internal/backends/sqlite/collection.go

+		var j int
+		for i := 0; i < len(params.Docs); i += batchSize {
+			if j += batchSize; j > len(params.Docs) {
+				j = len(params.Docs)
 			}


I think that could be simplifying by using a slice instead of a second index j:

docs := params.Docs // ... i := min(batchSize, len(docs)) batch, docs := docs[:i], docs[i:]

Thanks it's easier to read too

AlekSi · 2023-10-30T17:43:02Z

internal/backends/sqlite/collection.go

-					metadata.DefaultColumn,
-				)
-				args = append(args, doc.RecordID())
+			q, args, err := prepareInsertStatement(meta.TableName, meta.Settings.CappedSize > 0, params.Docs[i:j])


Let's add a method for meta.Settings.CappedSize > 0

It was already added by another PR, using it now 👍

AlekSi · 2023-10-30T17:43:48Z

internal/handlers/sqlite/msg_insert.go

 	for {
-		i, d, err := docsIter.Next()
-		if errors.Is(err, iterator.ErrIteratorDone) {
+		if done {
 			break
 		}


for !done {

…tDB into clean_optimise_inserts

AlekSi · 2023-10-31T15:15:45Z

task: [bench-postgresql] ../bin/benchstat old-postgresql.txt new-postgresql.txt
goos: darwin
goarch: arm64
pkg: github.com/FerretDB/FerretDB/integration
                                                        │ old-postgresql.txt │          new-postgresql.txt          │
                                                        │       sec/op       │    sec/op     vs base                │
InsertMany/SmallDocuments/Docs1000/7dc4/Batch1-10               548.6m ±  2%   595.5m ±  2%   +8.55% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch10-10             467.63m ±  3%   73.34m ± 11%  -84.32% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch100-10            456.81m ±  1%   30.02m ±  5%  -93.43% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch1000-10           454.93m ± 38%   17.26m ±  7%  -96.21% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch1-10             5.248 ±  4%    5.083 ± 21%        ~ (p=0.481 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch10-10          1569.5m ±  2%   958.8m ±  5%  -38.91% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch100-10         1437.0m ±  2%   771.6m ±  6%  -46.31% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch1000-10        1646.1m ± 10%   723.0m ±  1%  -56.08% (p=0.000 n=10)
geomean                                                          1.005         297.6m        -70.38%

                                                        │ old-postgresql.txt │          new-postgresql.txt          │
                                                        │        B/op        │     B/op      vs base                │
InsertMany/SmallDocuments/Docs1000/7dc4/Batch1-10               68.38Mi ± 0%   76.23Mi ± 0%  +11.48% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch10-10              22.28Mi ± 0%   16.20Mi ± 0%  -27.29% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch100-10             17.74Mi ± 0%   10.88Mi ± 0%  -38.65% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch1000-10            17.41Mi ± 0%   10.44Mi ± 0%  -40.07% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch1-10            590.9Mi ± 0%   597.7Mi ± 0%   +1.15% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch10-10           639.6Mi ± 0%   785.6Mi ± 0%  +22.82% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch100-10          677.3Mi ± 0%   885.4Mi ± 0%  +30.73% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch1000-10         668.3Mi ± 0%   876.8Mi ± 0%  +31.20% (p=0.000 n=10)
geomean                                                         129.8Mi        122.6Mi        -5.52%

                                                        │ old-postgresql.txt │         new-postgresql.txt          │
                                                        │     allocs/op      │  allocs/op   vs base                │
InsertMany/SmallDocuments/Docs1000/7dc4/Batch1-10                698.7k ± 0%   702.2k ± 0%   +0.49% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch10-10               300.1k ± 0%   152.3k ± 0%  -49.24% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch100-10             259.70k ± 0%   95.57k ± 0%  -63.20% (p=0.000 n=10)
InsertMany/SmallDocuments/Docs1000/7dc4/Batch1000-10            256.43k ± 0%   91.13k ± 0%  -64.46% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch1-10             2.980M ± 0%   2.981M ± 0%        ~ (p=0.393 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch10-10            2.570M ± 0%   2.415M ± 0%   -6.05% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch100-10           2.521M ± 0%   2.351M ± 0%   -6.73% (p=0.000 n=10)
InsertMany/SettingsDocuments/Docs1000/b34e/Batch1000-10          2.515M ± 0%   2.346M ± 0%   -6.74% (p=0.000 n=10)
geomean                                                          952.6k        662.2k       -30.48%

Changes to optimise inserts by making them call in batches

9f2e117

princejha95 requested review from AlekSi and a team as code owners October 19, 2023 17:14

princejha95 requested a review from rumyantseva October 19, 2023 17:14

mergify bot assigned princejha95 Oct 19, 2023

AlekSi reviewed Oct 20, 2023

View reviewed changes

mergify bot added the conflict PRs that have merge conflicts label Oct 20, 2023

chilagrow self-assigned this Oct 26, 2023

Merge branch 'main' into clean_optimise_inserts

e2b6512

mergify bot removed the conflict PRs that have merge conflicts label Oct 27, 2023

chilagrow added 2 commits October 27, 2023 15:52

make test pass for unordered insert

7f05d4f

add test cases for multiple fail

2fa61ce

chilagrow added the code/enhancement Some user-visible feature could work better label Oct 27, 2023

chilagrow added this to the Next milestone Oct 27, 2023

chilagrow requested review from a team, chilagrow, noisersup and AlekSi October 27, 2023 07:49

chilagrow changed the title ~~Changes to optimise inserts by making them call in batches~~ Optimize insert by batch inserts Oct 27, 2023

chilagrow enabled auto-merge (squash) October 27, 2023 07:56

chilagrow previously approved these changes Oct 27, 2023

View reviewed changes

noisersup previously approved these changes Oct 27, 2023

View reviewed changes

Merge branch 'main' into clean_optimise_inserts

7ec7c20

chilagrow dismissed stale reviews from noisersup and themself via 7ec7c20 October 30, 2023 01:27

fix merge

cb0c9cf

chilagrow requested review from chilagrow and noisersup October 30, 2023 01:34

chilagrow previously approved these changes Oct 30, 2023

View reviewed changes

Merge branch 'main' into clean_optimise_inserts

402444a

AlekSi changed the title ~~Optimize insert by batch inserts~~ Optimize insert performance by batching Oct 30, 2023

AlekSi disabled auto-merge October 30, 2023 15:50

AlekSi enabled auto-merge (squash) October 30, 2023 15:50

AlekSi reviewed Oct 30, 2023

View reviewed changes

chilagrow added 2 commits October 31, 2023 12:17

simplify code

e285462

Merge branch 'clean_optimise_inserts' of github.com:princejha95/Ferre…

3b09ad4

…tDB into clean_optimise_inserts

chilagrow dismissed their stale review via 3b09ad4 October 31, 2023 03:19

simplify fallback insert

bdc6b8b

chilagrow requested review from AlekSi and chilagrow October 31, 2023 04:09

AlekSi added 3 commits October 31, 2023 15:18

Merge branch 'main' into clean_optimise_inserts

fd9d6ea

Small cleanups

788aca4

Tiny cleanup

4ee8970

AlekSi disabled auto-merge October 31, 2023 15:16

AlekSi merged commit ba384d9 into FerretDB:main Oct 31, 2023
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `insert` performance by batching #3621

Optimize `insert` performance by batching #3621

princejha95 commented Oct 19, 2023 •

edited by chilagrow

Loading

codecov bot commented Oct 19, 2023 •

edited

Loading

AlekSi left a comment

mergify bot commented Oct 20, 2023

chilagrow left a comment

noisersup left a comment

AlekSi Oct 30, 2023

chilagrow Oct 31, 2023

AlekSi Oct 30, 2023

chilagrow Oct 31, 2023

AlekSi Oct 30, 2023

chilagrow Oct 31, 2023

AlekSi Oct 30, 2023

chilagrow Oct 31, 2023

AlekSi commented Oct 31, 2023

Optimize insert performance by batching #3621

Optimize insert performance by batching #3621

Conversation

princejha95 commented Oct 19, 2023 • edited by chilagrow Loading

Description

Readiness checklist

codecov bot commented Oct 19, 2023 • edited Loading

Codecov Report

AlekSi left a comment

Choose a reason for hiding this comment

mergify bot commented Oct 20, 2023

chilagrow left a comment

Choose a reason for hiding this comment

noisersup left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlekSi commented Oct 31, 2023

Optimize `insert` performance by batching #3621

Optimize `insert` performance by batching #3621

princejha95 commented Oct 19, 2023 •

edited by chilagrow

Loading

codecov bot commented Oct 19, 2023 •

edited

Loading