Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisiting mkimage-arch.sh #9764

Merged
merged 2 commits into from
Jan 13, 2015
Merged

Conversation

ztombol
Copy link
Contributor

@ztombol ztombol commented Dec 21, 2014

Overview

This pull request aims to clean up and slightly reduce the size of Arch Linux base images generated with mkimage-arch.sh. So far it does the following two things:

  • prevent installing more unnecessary packages
  • delete man pages of installed packages

These changes reduce the image size by 17.7 MB or ~5% (from 310.4 MB to 292.7 MB).

_Note:_ Image size is the VIRTUAL SIZE as reported by docker inspect on a btrfs file system. The test images were built around 2015-01-08 18:45:48 UTC.

Details

Deleting man pages

The script already skips installing man-db and man-pages to save on disk space, however it fails to delete man pages that belong to the installed packages. rm -r /usr/share/man/* accomplishes just that, and reduces the image size by about 11.9 MB.

Revising list of installed packages

This is more for cleaning up than to save a few megabytes, but it also further reduces the image size by 5.8 MB.

The following is a revised list of packages that should not be installed and a short explanation on why they should be left out, along with a few notes on packages that cannot be left out. Packages denoted with an * are already ignored in the current version of mkimage-arch.sh, the rest are new additions.

For completeness, here are the members of the base group that are installed.
bash, bzip2, coreutils, diffutils, e2fsprogs, file, filesystem, findutils, gawk, gcc-libs, gettext, glibc, grep, gzip, inetutils, iputils, less, licenses, logrotate, pacman, perl, procps-ng, psmisc, sed, shadow, sysfsutils, tar, texinfo, util-linux, which.

_Note:_ groff has been removed from the ignore list because it is actually not installed by base (it's in base-devel).

Images sizes

The following table summarises the space saving introduced in this pull request. Images are cumulative in that an image also contains the changes made by the one above it, e.g. minimal packages also deletes the man pages. Differences in size are calculated against the image immediately above.

image size (MB) diff (MB)
original 310.4
deleting man 298.5 -11.9
minimal packages 294.3 -4.2
removed editors 292.7 -1.6

Feedback wanted

The patched script builds a working image, however it's possible that I overlooked an unnecessary package or removed an important one. I linked all package names to the respective pages on the Arch Linux web site to make it easier to review the changes. Let me know what you think.

@ztombol ztombol changed the title Fix reduce arch rootfs size Revisiting mkimage-arch.sh Dec 21, 2014
@thaJeztah
Copy link
Member

(I'm not an Arch Linux user, just stumbled on this PR)

Don't know if nano and vi are really necessary in a base image? For reference, I created a gist with the packages installed in the debian:wheezy; https://gist.github.com/thaJeztah/2bead9762a7c9df6dee4

@ztombol
Copy link
Contributor Author

ztombol commented Dec 21, 2014

@thaJeztah Thanks for the gist! It's a good base for comparison. What can you do with iproute or sysvinit in a container? AFAIK, network interface management is done on the host and containers don't need an init system unless you run multiple services in the same container for which the supervisor is recommended.

Also, vi and nano may be good targets for removal, though interactive containers may rely on them. Let's wait for feedback from others.


So far the packages I stripped out fall into three categories. Packages that

  • do not work in containers, e.g. linux kernel
  • duplicate work already taken care by the docker daemon, e.g. managing resolv.conf, configuring network interface, handling pci and usb devices (I know this may be possible in a --privileged container, but using privileged containers are discoureged)
  • provide functionality necessary only in application specific containers, e.g s-nail

What to include in a container is highly personal, so I tried to avoid imposing my personal preferences on the base image generation. For example I never used ftp or telnet in a container, but they are included in virtually every linux distribution by default. Users take them for granted and removing them may inconvenience some.

Obviously there is a fine line between what to strip out and what to include, and it is largely based on personal preference. What I want to achieve here, is to strip out packages that are widely believed to be unnecessary because they provide functionality that belongs to the host, highly application specific and/or not widely used.

@thaJeztah
Copy link
Member

though interactive containers may rely on them

In my personal opinion, a base image should be a 'clean slate'; Editors are really personal and in most cases not even necessary (most containers won't be interactive). If someone wants to have an editor inside the container, he/she should create a base image, containing those tools.

Tools like sed, curl, gz etc, are used in many Dockerfiles, often to modify default configuration files or to download and expand dependencies during build, so those are candidates to keep in.

But, agreed, I think some other people should have a look as well. As said, I personally don't use this feature, so I'm not the best person to ask for a use case.

@jessfraz
Copy link
Contributor

jessfraz commented Jan 6, 2015

I personally think the images should be clean as well and not include editors etc, but @tianon is the main deciding factor here

@tianon
Copy link
Member

tianon commented Jan 6, 2015

I'm definitely +1 on removing editors, etc. from base images, but sometimes upstream isn't as privy to that (see Ubuntu as an example).

@SvenDowideit
Copy link
Contributor

awesome info - bookmarking for docs :)

@ztombol
Copy link
Contributor Author

ztombol commented Jan 8, 2015

Thanks guys! I updated the pull request to remove nano and vi (shaving off another 1.6 MB) and updated the OP to reflect the changes.

@tianon let me know if you want me to squash the commits into one before merging.

pcmciautils,usbutils,jfsutils,xfsprogs,reiserfsprogs,lvm2,mdadm,cryptsetup,\
device-mapper,man-db,man-pages,s-nail,nano,vi
EOF
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be easier to maintain long-term if we used something more like:

PKGIGNORE=(
    cryptsetup
    device-mapper
    dhcpcd
    iproute2
    jfsutils
    linux
    lvm2
    man-db
    man-pages
    mdadm
    nano
    netctl
    openresolv
    pciutils
    pcmciautils
    reiserfsprogs
    s-nail
    systemd-sysvcompat
    usbutils
    vi
    xfsprogs
)
IFS=','
PKGIGNORE="${PKGIGNORE[*]}"
unset IFS

@dasJ
Copy link

dasJ commented Jan 13, 2015

With the list of @tianon, those packages are being installed into the image:

bzip2
coreutils
diffutils
e2fsprogs
file
filesystem
findutils
gawk
gcc-libs
gettext
glibc
grep
gzip
inetutils
iputils
less
licenses
logrotate
pacman
perl
procps-ng
psmisc
sed
shadow
sysfsutils
tar
texinfo
util-linux
which

If we ignore the fact, that dockerfiles may depend on them, or they may be useful for most of the users, these packages could also be removed: diffutils, file, gettext, grep, inetutils, iputils, logrotate, psmisc, sysfsutils, tar, and which.

procps-ng has to be removed after the pacman keys were populated because procps-ng contains pkill. Removing this package also removes the systemd package.

locale-gen, which is called from the script requires sed.

When leaving procps-ng and sed in, my image is 285,4MB in size.

Docker-DCO-1.1-Signed-off-by: Zoltan Tombol <zoltan.tombol@gmail.com> (github: ztombol)
Docker-DCO-1.1-Signed-off-by: Zoltan Tombol <zoltan.tombol@gmail.com> (github: ztombol)
@ztombol ztombol force-pushed the fix-reduce-arch-rootfs-size branch from 71fa3a3 to 18c0b41 Compare January 13, 2015 17:15
@ztombol
Copy link
Contributor Author

ztombol commented Jan 13, 2015

@tianon good point. Amended.

@tianon
Copy link
Member

tianon commented Jan 13, 2015

@dasJ so, in both Debian and Gentoo, there's an understanding in package metadata that packages in the base set are not required to be listed explicitly in dependencies of other packages -- is that true in Arch as well? ie, are packages allowed to assume that basic tools like pkill, sed, grep, etc are available without explicitly depending on them?

@tianon
Copy link
Member

tianon commented Jan 13, 2015

This PR as-is LGTM anyhow, but I think there's clearly room for further discussion about the exist list of packages to purge. 👍

@ztombol
Copy link
Contributor Author

ztombol commented Jan 13, 2015

@dasJ You left out bash from the list, but other than that it's correct.

"If we ignore the fact, that dockerfiles may depend on them, or they may be useful for most of the users..."

I think this is the perfect argument against removing those packages.

file, inetutils, iputils, procps-ng, ps-misc, sed, sysfsutils, tar, which provide binaries that are too essential to be removed. An image without them would barely work. It would be impossible to derive new images from such a base. Many shell scripts would fail as well.

I'm not sure about logrotate, diffutils and gettext.

PS.: The list of installed packages in the OP was not correct. I missed e2fsprogs and texinfo, and included iproute2. It should be fixed now.

@jessfraz
Copy link
Contributor

LGTM

jessfraz pushed a commit that referenced this pull request Jan 13, 2015
@jessfraz jessfraz merged commit 21cf4b4 into moby:master Jan 13, 2015
@ztombol
Copy link
Contributor Author

ztombol commented Jan 13, 2015

@tianon That is not true for Arch.

Packages list all run-time dependencies even if they are in base. For example device-mapper lists systemd, both of them are in base. And so does the docker package. I think that's because in Arch base is a suggestion rather than a requirement and users are encouraged to customise even by leaving out packages they don't need (e.g. jfsutils).

So removing packages suggested by @dasJ may not prevent anyone from installing and using packaged applications. But user written scripts, e.g. docker entrypoint and related, would most likely fail.

Note: There is a similar rule for packages built from source. They should not list compile time dependencies that are already in the base-devel group (packages required for building packages, e.g. make and gcc).

@ztombol ztombol deleted the fix-reduce-arch-rootfs-size branch January 13, 2015 19:05
@tianon
Copy link
Member

tianon commented Jan 13, 2015

Ah cool, thanks for clarifying!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants