-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support parallel decompression (pigz) #2640
Conversation
Is there any reason not to make this a compilation option and use pgzip by default? |
benchmarkDecompression(b, sizeInMb) | ||
} | ||
|
||
func BenchmarkGzipDecompression64(b *testing.B) { benchmarkGzipDecompression(b, 64) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmarks can just be BenchmarkDecompression*
and run with 2 different environments(or compilation options). This also makes it easier to use with benchcmp
archive/compression/compression.go
Outdated
@@ -37,6 +42,8 @@ const ( | |||
Gzip | |||
) | |||
|
|||
const disablePgzipEnv = "CONTAINER_DISABLE_PGZIP" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid using env var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM, but it looks like b8aca28 still needs the Signed-off-by line.
889f1bd
to
806f6e9
Compare
Thanks for contributing this! http://github.com/klauspost/pgzip has no external dependencies, right? Just double check. |
@Random-Liu yes, there are no external dependencies. |
@mxpv Thanks! |
b.Fatal(err) | ||
} | ||
|
||
_, err = ioutil.ReadAll(decompressor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try this _, err = io.CopyBuffer(ioutil.Discard, decompressor, buf)
where buf is declared before the closure as buf := make([]byte, 32*1024)
. I saw a massive improvement locally with that.
I am seeing much different benchmark results. Are you running these with any other flags to see the benefits of parallelism?
|
@dmcgowan that's interesting. It looks like |
@mxpv I am curious if that was similarly the bottleneck for the external process approach. This is a much cleaner approach but I am wondering if it is a problem with the implementation or the way it is being tested (size of test, parallelism, etc). |
@dmcgowan 4440.48MB/s is way over normal decompression speed. Either the content is way too compressible or uncompressible to be a real-world test. Typical speeds should be at least an order of magnitude slower. Also, if you are just copying to |
@dmcgowan I've reworked benchmark to make it a bit closer to real world scenario.
|
I ran
|
62bb3a9
to
8bb1073
Compare
Can you squash your commits? I'm not a fan of the merge commit, it's better to do a rebase on master, not a merge. After a squash this LGTM |
6c313ef
to
6e131af
Compare
@crosbymichael done |
@@ -0,0 +1,5902 @@ | |||
[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move test data to gist or somewhere else? (could be downloaded from TestMain())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
182766a
to
1d909d7
Compare
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
I am going to add this to 1.2 milestone, we should have this in the release. There are still a few changes I want to see but could be done after we merge this
|
"testing" | ||
) | ||
|
||
const benchmarkTestDataURL = "https://git.io/fADcl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think its a great idea to call out to an external url every time the tests are run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move test data to gist or somewhere else? (could be downloaded from TestMain())
@AkihiroSuda @stevvooe could you please decide which approach works for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I realize the previous suggestion may not have been clear. I believe the suggestion to move to a gist was to move the entire benchmark to a gist, not just the input file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmcgowan oh, ok. I've moved the benchmark to https://git.io/fASMy and added sync.Once
for initialization
Benchmark gist: https://git.io/fASMy Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
Codecov Report
@@ Coverage Diff @@
## master #2640 +/- ##
==========================================
- Coverage 47.54% 43.98% -3.56%
==========================================
Files 87 94 +7
Lines 8211 10362 +2151
==========================================
+ Hits 3904 4558 +654
- Misses 3590 5083 +1493
- Partials 717 721 +4
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM |
The benchmarks were deleted in containerd#2640 but we could use that to evaluate zstd further. Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
The benchmarks were deleted in containerd#2640 but we could use that to evaluate zstd further. Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
The benchmarks were deleted in containerd#2640 but we could use that to evaluate zstd further. Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
This PR speed up image (layer) decompression by using parallel gzip (http://github.com/klauspost/pgzip).
In the first revision decompression was done by running
unpigz
external process (similarly to how it's done in moby - moby/moby#35697), but after performing benchmarks it turned out that it's surprisingly slow (I think it's due to external process overhead and moving data back and forth via stdin/stdout).Here are benchmark results:
CC @samuelkarp
Related #1896