Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make image (layer) downloads faster by using pigz #35697

Merged
merged 1 commit into from
Jan 17, 2018

Conversation

sargun
Copy link
Contributor

@sargun sargun commented Dec 4, 2017

- What I did
The Golang built-in gzip library is serialized, and fairly slow
at decompressing. It also only decompresses on demand, versus
pipelining decompression. This fixes those problems.

- How I did it
This change switches to trying to use parallel gzip for gzip decompression as opposed to the
golang one. If it isn't available, it'll fall back to the default. This code path can also be disabled by environment variable.

- How to verify it
There are existing tests for this codepath to ensure correctness of this implementation. I ran some manual benchmarks, and I found I was able to get about 50% better performance as opposed to the build in library. These performance gains were primarily seen on images with layers about 10MB.

- Description for the changelog
Make image (layer) downloads faster by using pgzip

@sargun
Copy link
Contributor Author

sargun commented Dec 5, 2017

Any idea what:

00:02:07 2017/12/05 00:02:07 unrecognized import path "cloud.google.com/go" (parse https://cloud.google.com/go?go-get=1: no go-import meta tags)

Is due to?

@thaJeztah
Copy link
Member

looks like this is implementing the same as #34788 (looks like that one stalled though)

@sargun
Copy link
Contributor Author

sargun commented Dec 5, 2017 via email

@thaJeztah
Copy link
Member

We preferred to use the external binary instead of using the Go implementation (see the other pull request); unless there's compelling reasons that you know of.

@sargun sargun force-pushed the use-pgzip branch 4 times, most recently from 7c2c30f to c84bb1a Compare December 5, 2017 09:21
@sargun
Copy link
Contributor Author

sargun commented Dec 5, 2017

@thaJeztah Switched out. It looks like LZMA / xz was already doing this, so I took this opportunity to share code between the two of them.

@sargun
Copy link
Contributor Author

sargun commented Dec 5, 2017

Once this is in, I'll add pigz as a recommendation to the packages here: https://github.com/docker/docker-ce-packaging.

@sargun sargun changed the title Make image (layer) downloads faster by using pgzip Make image (layer) downloads faster by using pigz Dec 5, 2017
@sargun
Copy link
Contributor Author

sargun commented Dec 5, 2017

@thaJeztah I don't see an easy way to install pigz in the container for CI. Any ideas?

@sargun
Copy link
Contributor Author

sargun commented Dec 6, 2017

@thaJeztah PTAL.

@sargun
Copy link
Contributor Author

sargun commented Dec 6, 2017

CC:
@ripcurld0
@jboero


unpigzPath, err := exec.LookPath("unpigz")
if err != nil {
logrus.Debug("Cannot find unpigz")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly change the error to be something like "Unpigz binary not found in PATH, falling back to regular gzip" (better suggestions welcome 😅 )

@thaJeztah
Copy link
Member

Can you add the package to the Dockerfiles, so that it can be run in CI? (Or is it already there)?

@thaJeztah
Copy link
Member

ping @AkihiroSuda @unclejack @stevvooe PTAL

@AkihiroSuda
Copy link
Member

I don't think we should add an environment variable.

I suggest introducing a daemon flag --parallel-decompression=pigz|none and pass it to pkg/archive via some WithPigz DecompressOpt like

func DecompressStream(archive io.Reader, opts DecompressOpt...) (io.ReadCloser, error)

@sargun
Copy link
Contributor Author

sargun commented Dec 8, 2017

@AkihiroSuda Exposing that knob seems like overkill, and introduces a deprecation cycle if we ever want to revert to using the Golang parallel gzip library.

@sargun
Copy link
Contributor Author

sargun commented Dec 8, 2017

@AkihiroSuda If we do anything, I'd suggest (part of a different PR), --decompress-gz-with=gzip|unpigz|internal

@AkihiroSuda
Copy link
Member

--decompress-gz-with=gzip|unpigz|internal

What is gzip here? Do you mean executing /bin/gzip?

@sargun
Copy link
Contributor Author

sargun commented Dec 8, 2017

@AkihiroSuda correct.

@AkihiroSuda
Copy link
Member

How /bin/gzip is useful?

jose-bigio pushed a commit to jose-bigio/docker-ce that referenced this pull request Jan 25, 2018
This change is in response to moby/moby#35697
It adds pigz to the recommended binaries that should be installed with
docker-ce.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Upstream-commit: 1ca014b
Component: packaging
@jboero
Copy link

jboero commented Apr 23, 2018

Wow I thought nobody liked my pigz proposal but then I just noticed a RHEL 7 install failed with pigz as required prerequisite:

ec2-user@ip- ~> sudo yum install pigz
Loaded plugins: amazon-id, rhui-lb, search-disabled-repos
No package pigz available.
Error: Nothing to do

I love that pigz is now used but my PR had no packaging dependency and would fall back to regular gzip if pigz was unavailable. D'oh!

Anybody in packaging please remove pigz from deps... RHEL 7 doesn't have a pigz in repos.

Thanks guys!

@cpuguy83
Copy link
Member

@jboero Agree having it as a requirement is not right, however there is no version of Docker for RHEL that has pigz support yet.

@sargun
Copy link
Contributor Author

sargun commented Apr 23, 2018 via email

@cpuguy83
Copy link
Member

cpuguy83 commented Apr 23, 2018 via email

@jboero
Copy link

jboero commented Apr 23, 2018

Ah of course. OK thanks guys. I'm glad they pushed for using external pigz instead of trying to rewrite everything from scratch in go. Well done.

@jboero
Copy link

jboero commented Apr 23, 2018

Workaround - pigz is included in EPEL which works with RHEL (and CentOS)

@lox
Copy link

lox commented Apr 5, 2019

It's super not clear how to actually make use of this, do we need to ensure that unpigz is available on hosts to see the speed up?

@sargun
Copy link
Contributor Author

sargun commented Apr 5, 2019

@lox correct

silvin-lubecki pushed a commit to silvin-lubecki/packaging-extract that referenced this pull request Jan 30, 2020
This change is in response to moby/moby#35697
It adds pigz to the recommended binaries that should be installed with
docker-ce.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Upstream-commit: 1ca014b
Component: packaging
@GitHubDiom
Copy link

GitHubDiom commented Nov 10, 2020

Hey, how do you guys make package decompression parallel?
I'd like to test parallel decompression but can't seem to do it.

dockerinfo

Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 7
 Server Version: 19.03.13
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.0.2-1.el7.elrepo.x86_64
 Operating System: CentOS Linux 7 (Core)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.757GiB
 Name: master
 ID: KGRO:7J7A:7TKD:UNUO:ZQ7A:QYEJ:DJK4:OK4S:7JBS:U5IG:LBS3:2RJ7
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

after run docker run gcc

  • download is parallel
    image
  • But the extraction is serial.
    image

@thaJeztah
Copy link
Member

@GitHubDiom this PR makes each individual "extract" run faster if pigz is installed, but layers still have to be extracted/applied in sequence; see #21814, #37957 for more details

@GitHubDiom
Copy link

GitHubDiom commented Nov 10, 2020

@GitHubDiom this PR makes each individual "extract" run faster if pigz is installed, but layers still have to be extracted/applied in sequence; see #21814, #37957 for more details

Can docker pipeline the pull process for a specific layer?
As far as I know, three phases appear when pulling one layer, for example, download(Network-intensive) -> decompress (CPU-intensive) -> extract (IO-intensive),
where decompress phase means to decompress the image from Net-adaptor buffer to memory, and extract phase means to move the image from memory to disk( it is exactly? )

For instance, extracting the layer may start immediately after the beginning of the layer image has been decompressed.

@thaJeztah
Copy link
Member

Theres some comments about decompress/extract in #21814 (comment) (and further)

fnkr added a commit to punktDe/ansible-docker that referenced this pull request Sep 21, 2021
Can increase Docker image extraction speed significantly:
moby/moby#35697
Okeanos added a commit to Okeanos/dotfiles that referenced this pull request May 12, 2023
jingkaihe added a commit to jingkaihe/concourse-dind that referenced this pull request Oct 6, 2023
and introduce pigz for layer extract speed up:

moby/moby#35697

Signed-off-by: Jingkai He <jingkai@hey.com>
ramizpolic pushed a commit to ramizpolic/concourse-dind that referenced this pull request Apr 2, 2024
* bash script fixing

Signed-off-by: Jingkai He <jingkai@hey.com>

* use the latest focal image as the base image

Signed-off-by: Jingkai He <jingkai@hey.com>

* removed unnecessary dependencies

and introduce pigz for layer extract speed up:

moby/moby#35697

Signed-off-by: Jingkai He <jingkai@hey.com>

* bring net-tools & iproute2 back

Signed-off-by: Jingkai He <jingkai@hey.com>

---------

Signed-off-by: Jingkai He <jingkai@hey.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.