Improve sjson package fuzzing #3071

quasilyte · 2023-07-18T12:41:07Z

Description

Previously, only FuzzDocument was reaching the marshal+unmarshal steps due to the document type assertion. If it wasn't a documentType, the rest of the fuzzing function was skipped.

Since data validation is available only via types.Document.ValidateData, a helper isValidDocumentData function is introduced. It may go away in the future. But for now, it allows all sjson types to pass the validation.

The other change is related to the way we fuzz sjson. Instead of generating both values and schemes at the same time, focus on one thing at the time: a fixed scheme against the generated and fixed documents against the generated schemes.

This approach can be further improved by the increased number of relevant corpus seed entries, but that's a different issue (#3067).

To answer the question is it any better: yes, it is. The old approach skipped any cases where either of those two was invalid; it was a very rare occasion for a fuzz runner to generate a valid JSON for both document and scheme. It's even more rare to get both of them even remotely compatible.

Closes #1273

Readiness checklist

I added/updated unit tests (and they pass).
I added/updated integration/compatibility tests (and they pass).
I added/updated comments and checked rendering.
I made spot refactorings.
I updated user documentation.
I ran task all, and it passed.
I ensured that PR title is good enough for the changelog.
(for maintainers only) I set Reviewers (@FerretDB/core), Labels, Project and project's Sprint fields.
I marked all done items in this checklist.

Previously, only FuzzDocument was reaching the marshal+unmarshal steps due to the document type assertion. If it wasn't a documentType, the rest of the fuzzing function was skipped. Since data validation is available only via `types.Document.ValidateData`, a helper `isValidDocumentData` function is introduced. It may go away in the future. But for now, it allows all sjson types to pass the validation. The other change is related to the way we fuzz sjson. Instead of generating both values and schemes at the same time, focus on one thing at the time: a fixed scheme against the generated and fixed documents against the generated schemes. This approach can be further improved by the increased number of relevant corpus seed entries, but that's a different issue (#3067). To answer the question is it any better: yes, it is. The old approach skipped any cases where either of those two was invalid; it was a very rare occasion for a fuzz runner to generate a valid JSON for both document and scheme. It's even more rare to get both of them even remotely compatible. Updates #1273

codecov · 2023-07-18T12:45:08Z

Codecov Report

Merging #3071 (bb89b5b) into main (fbe7748) will decrease coverage by 0.05%.
The diff coverage is 100.00%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3071      +/-   ##
==========================================
- Coverage   76.48%   76.44%   -0.05%     
==========================================
  Files         386      386              
  Lines       21165    21165              
==========================================
- Hits        16188    16179       -9     
- Misses       4050     4060      +10     
+ Partials      927      926       -1

Impacted Files	Coverage Δ
internal/handlers/sjson/sjson.go	`85.65% <100.00%> (+2.17%)`	⬆️

... and 12 files with indirect coverage changes

Flag	Coverage Δ
hana	`?`
integration	`72.98% <0.00%> (-0.04%)`	⬇️
mongodb	`5.54% <0.00%> (ø)`
pg	`66.26% <0.00%> (-0.04%)`	⬇️
shard-1	`54.80% <0.00%> (-0.01%)`	⬇️
shard-2	`55.20% <0.00%> (+0.06%)`	⬆️
shard-3	`57.64% <0.00%> (-0.11%)`	⬇️
sqlite	`46.78% <0.00%> (-0.01%)`	⬇️
unit	`24.23% <100.00%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

chilagrow

Great improvement!

internal/handlers/sjson/array_test.go

internal/handlers/sjson/sjson_test.go

chilagrow

🚀 Let's start fuzzing!

AlekSi

I marked all done items in this checklist.

Please do that.

And Conform PR fails now ;)

LGTM overall

AlekSi · 2023-07-19T06:12:10Z

Taskfile.yml

+      - go test -run=XXX -fuzz=FuzzArrayWithFixedSchemas      -fuzztime={{.FUZZ_TIME}} ./internal/handlers/sjson/
+      - go test -run=XXX -fuzz=FuzzArrayWithFixedDocuments    -fuzztime={{.FUZZ_TIME}} ./internal/handlers/sjson/
+      - go test -run=XXX -fuzz=FuzzDocumentWithFixedSchemas   -fuzztime={{.FUZZ_TIME}} ./internal/handlers/sjson/
+      - go test -run=XXX -fuzz=FuzzDocumentWithFixedDocuments -fuzztime={{.FUZZ_TIME}} ./internal/handlers/sjson/


They were sorted alphabetically, please move them back

AlekSi · 2023-07-19T06:16:34Z

internal/handlers/sjson/sjson_test.go

+func addRecordedFuzzDocs(f *testing.F, needDocument, needSchema bool) int {
+	// We're trying to use that corpus with our hopes set high,
+	// but chances are, it will still be 0 extra documents.
+	// See #3067 for more details.


We always use full-issue URLs and TODO markers for such cases

It is still like that in the main branch: https://github.com/FerretDB/FerretDB/blob/main/internal/handlers/sjson/sjson_test.go#L196

AlekSi · 2023-07-19T06:18:07Z

internal/handlers/sjson/sjson_test.go

+				// if there was no error,
+				// check that the documents match each other


That comment duplicates the code, we should improve it:

FerretDB/CONTRIBUTING.md

Lines 226 to 229 in fbe7748

- In code comments, in general, do not describe _what_ the code does: it should be clear from the code itself

(and when it doesn't and the code is tricky, simplify it instead).

Instead, describe _why_ the code does that if it is not clear from the surrounding context, names, etc.

There is no need to add comments just because there are no comments if everything is already clear without them.

quasilyte requested review from a team and AlekSi as code owners July 18, 2023 12:41

quasilyte requested a review from rumyantseva July 18, 2023 12:41

mergify bot assigned quasilyte Jul 18, 2023

Merge branch 'main' into quasilyte/improve_sjson_fuzzing

ca2a381

quasilyte changed the title ~~improve sjson package fuzzing~~ Improve sjson package fuzzing Jul 18, 2023

quasilyte enabled auto-merge (squash) July 18, 2023 12:56

apply linter suggestions

4886d1a

quasilyte requested review from a team, chilagrow and noisersup July 18, 2023 13:47

quasilyte disabled auto-merge July 18, 2023 13:52

quasilyte enabled auto-merge (squash) July 18, 2023 13:53

chilagrow previously approved these changes Jul 19, 2023

View reviewed changes

internal/handlers/sjson/array_test.go Outdated Show resolved Hide resolved

internal/handlers/sjson/sjson_test.go Show resolved Hide resolved

follow the review comments

ca5a8b2

quasilyte dismissed chilagrow’s stale review via ca5a8b2 July 19, 2023 05:39

chilagrow approved these changes Jul 19, 2023

View reviewed changes

AlekSi approved these changes Jul 19, 2023

View reviewed changes

AlekSi added this to the Next milestone Jul 19, 2023

Merge branch 'main' into quasilyte/improve_sjson_fuzzing

bb89b5b

quasilyte added the code/chore Code maintenance improvements label Jul 19, 2023

quasilyte merged commit 4106994 into FerretDB:main Jul 19, 2023

quasilyte deleted the quasilyte/improve_sjson_fuzzing branch July 19, 2023 08:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve sjson package fuzzing #3071

Improve sjson package fuzzing #3071

quasilyte commented Jul 18, 2023 •

edited

Loading

codecov bot commented Jul 18, 2023 •

edited

Loading

chilagrow left a comment

chilagrow left a comment

AlekSi left a comment

AlekSi Jul 19, 2023

AlekSi Jul 19, 2023

AlekSi Jul 24, 2023

AlekSi Jul 19, 2023 •

edited

Loading

		// if there was no error,
		// check that the documents match each other

	- In code comments, in general, do not describe _what_ the code does: it should be clear from the code itself
	(and when it doesn't and the code is tricky, simplify it instead).
	Instead, describe _why_ the code does that if it is not clear from the surrounding context, names, etc.
	There is no need to add comments just because there are no comments if everything is already clear without them.

Improve sjson package fuzzing #3071

Improve sjson package fuzzing #3071

Conversation

quasilyte commented Jul 18, 2023 • edited Loading

Description

Readiness checklist

codecov bot commented Jul 18, 2023 • edited Loading

Codecov Report

chilagrow left a comment

Choose a reason for hiding this comment

chilagrow left a comment

Choose a reason for hiding this comment

AlekSi left a comment

Choose a reason for hiding this comment

AlekSi Jul 19, 2023

Choose a reason for hiding this comment

AlekSi Jul 19, 2023

Choose a reason for hiding this comment

AlekSi Jul 24, 2023

Choose a reason for hiding this comment

AlekSi Jul 19, 2023 • edited Loading

Choose a reason for hiding this comment

quasilyte commented Jul 18, 2023 •

edited

Loading

codecov bot commented Jul 18, 2023 •

edited

Loading

AlekSi Jul 19, 2023 •

edited

Loading