Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup and improve reproducibility of builds #553

Merged
merged 28 commits into from
Aug 10, 2020
Merged

Cleanup and improve reproducibility of builds #553

merged 28 commits into from
Aug 10, 2020

Conversation

PeterMylemans
Copy link
Contributor

@PeterMylemans PeterMylemans commented Jul 21, 2020

Package Manager package

  • Replaced glob pattern from package manager par with explicit params to make room for additional utilities
  • Added new utility: dpkg_extract extracts a subset of files from a debian archive to the current directory
  • Fixed differences in owner of /etc/os-release when building locally by using the build_tar utility.

cacerts

  • Updated actions to use the dpkg_extract and build_tar for reproducibility
  • Fixed differences in owner when building locally (by switching to build_tar)
  • Added targets in the BUILD to build both cacerts and cacerts_java tar balls for the different distros
  • Got rid of the extract_sh step and reset the visibility of the package to the workspace only. @dlorenc In August 2017 you set the cacerts to public for use in "debian-docker". Is this stil valid?
  • Removed cacerts deb output. Isn't used internally in the repo anymore since the upgrade to debian 10 by @evanj . Before it was added for "tracking package status". But this seems a bit strange to me as we are transforming the upstream package instead of "installing the package".

locale

  • Updated actions to use the dpkg_extract and build_tar for reproducibility
  • Added targets in the BUILD to build locale tar balls for the different distros

Fixes #552
Fixes #458 with the exception of the python 3.8 issue

@googlebot googlebot added the cla: yes CLAs look good label Jul 21, 2020
cacerts/BUILD Outdated Show resolved Hide resolved
@PeterMylemans PeterMylemans marked this pull request as ready for review July 22, 2020 18:32
@evanj
Copy link
Contributor

evanj commented Jul 24, 2020

Just to confirm I understand: This doesn't actually cause any problems for programs using Distroless, this seems to mostly be some simplification/cleanup of the build tools themselves, right?

I think this looks good to me, but I'm not an owner, just a sometime contributor and user :) To be honest I don't follow what exactly is happening with cacerts here. I do know that distroless at one point had somewhat "broken" public CA certificates, since it can't really run the Debian update-ca-certs tool which creates the normal symlink tree in /etc/ssl/certs, so some of this stuff is there to create the "aggregated" cert file. I think there are at least a few tests in the Java and Python tests to check that it works with google.com, because at some point it did not. Hopefully those tests do the trick :)

Copy link
Member

@chanseokoh chanseokoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely remember that when building the entire images, I saw output of multiple executions of the cacert extraction script. I just think if it is still the case with this new rule, it may increase the overall build time when we start to expand the usage in multiple places. Can it be that case? For example, if I use some extraction rule in java/BUILD, will it unpack a deb and create the same tar for each of the dozens of the Java images?

set -o errexit

DEB=$1
shift
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get line 6: shift: shift count out of range when I don't provide any argument to the script. (You may see different behaviors when using a sh vs bash.)

Copy link
Contributor Author

@PeterMylemans PeterMylemans Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have a python version of this script in another personal project that includes a basic form of filtering (excluding) files. That script is also using build_tar instead of the unix tooling.

It was developed after I opened this PR, so it might need some polish (an hour or so).

Might be better to switch to that one instead of fixing this shell script? Or do you have a preference for shell scripts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a shell script is fine. Not that a script is a good thing, but I don't think we want over-engineering. I mean, the goal is to not increase complexity and code lines to reduce maintenance burden. When I first came to this repo, everything was overwhelming.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the excluding feature will probably be a nice thing that we may need anyway. But let's not do it here to keep the PR small. We can always do it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed

package_manager/dpkg_extract.sh Outdated Show resolved Hide resolved
package_manager/util.py Show resolved Hide resolved
@chanseokoh
Copy link
Member

  • Fixed differences in owner of /etc/os-release when building locally by using the build_tar utility.

I don't think I saw any changes about this?

Again, thanks for the awesome contribution!

@PeterMylemans
Copy link
Contributor Author

I vaguely remember that when building the entire images, I saw output of multiple executions of the cacert extraction script. I just think if it is still the case with this new rule, it may increase the overall build time when we start to expand the usage in multiple places. Can it be that case? For example, if I use some extraction rule in java/BUILD, will it unpack a deb and create the same tar for each of the dozens of the Java images?

Given that they are now regular targets, it should work like any other dependency. So I would expect cacerts extraction to run twice (once for deb 9 and another time for deb 10). Same for cacerts_java, resulting in 4 targets in total.

@PeterMylemans
Copy link
Contributor Author

PeterMylemans commented Aug 7, 2020

Just to confirm I understand: This doesn't actually cause any problems for programs using Distroless, this seems to mostly be some simplification/cleanup of the build tools themselves, right?

I think this looks good to me, but I'm not an owner, just a sometime contributor and user :) To be honest I don't follow what exactly is happening with cacerts here. I do know that distroless at one point had somewhat "broken" public CA certificates, since it can't really run the Debian update-ca-certs tool which creates the normal symlink tree in /etc/ssl/certs, so some of this stuff is there to create the "aggregated" cert file. I think there are at least a few tests in the Java and Python tests to check that it works with google.com, because at some point it did not. Hopefully those tests do the trick :)

@evanj Correct. My worry relates more to people "highjacking" some of the public targets of this repo. In theory they should have pinned the repo by commit hash in their workspace, but they might run into issues when updating to a newer hash due to change of build tooling that should have been internal to distroless (imho).

@PeterMylemans
Copy link
Contributor Author

  • Fixed differences in owner of /etc/os-release when building locally by using the build_tar utility.

I don't think I saw any changes about this?

Again, thanks for the awesome contribution!

@chanseokoh package_manager/util.py#generate_os_release was changed for this.

@PeterMylemans
Copy link
Contributor Author

@chanseokoh let me know if you're happy with the code. Then I can do a final check to make sure the digests of the unchanged layers match the ones on gcr.

@chanseokoh
Copy link
Member

Looks good to me! Thanks for checking them. Really appreciate it.

@PeterMylemans
Copy link
Contributor Author

PeterMylemans commented Aug 8, 2020

Had to do a small fix for file permissions on /etc/os-release (is back to 0644 as on gcr).

I've got two diffs after local build:

  1. first layer in base (adds certificates). ./usr/share/doc/ca-certificates was added without defining permissions for parent directories (as is standard in the layer building tooling). So now the ./usr, ./usr/share are defined earlier in the tar entry list. This doesn't change the output, but does change the digest. --> one time change to fix the order, but remains stable after this.

Diff on the layer.tar entry list output

< is upstream
> is new
13a14,17
> drwxr-xr-x 0/0               0 1970-01-01 01:00 ./usr/
> drwxr-xr-x 0/0               0 1970-01-01 01:00 ./usr/share/
> drwxr-xr-x 0/0               0 1970-01-01 01:00 ./usr/share/doc/
> drwxr-xr-x 0/0               0 1970-01-01 01:00 ./usr/share/doc/ca-certificates/
40d43
< drwxr-xr-x 0/0               0 1970-01-01 01:00 ./usr/
47d49
< drwxr-xr-x 0/0               0 1970-01-01 01:00 ./usr/share/
72d73
< drwxr-xr-x 0/0               0 1970-01-01 01:00 ./usr/share/doc/
  1. 4th layer in java:base (adds locale c-bin and some other generic java dependencies). Now the entries are always added in alphabetic order (not some random order as before).
0a1,2
> drwxr-xr-x 0/0               0 1970-01-01 01:00 ./
> drwxr-xr-x 0/0               0 1970-01-01 01:00 ./etc/
13,14c15,16
< -rw-r--r-- 0/0              62 1970-01-01 01:00 ./usr/lib/locale/C.UTF-8/LC_NAME
< -rw-r--r-- 0/0            2498 1970-01-01 01:00 ./usr/lib/locale/C.UTF-8/LC_TIME
---
> -rw-r--r-- 0/0             131 1970-01-01 01:00 ./usr/lib/locale/C.UTF-8/LC_ADDRESS
> -rw-r--r-- 0/0         1515838 1970-01-01 01:00 ./usr/lib/locale/C.UTF-8/LC_COLLATE
15a18,19
> -rw-r--r-- 0/0             243 1970-01-01 01:00 ./usr/lib/locale/C.UTF-8/LC_IDENTIFICATION
> -rw-r--r-- 0/0              23 1970-01-01 01:00 ./usr/lib/locale/C.UTF-8/LC_MEASUREMENT
18d21
< -rw-r--r-- 0/0              50 1970-01-01 01:00 ./usr/lib/locale/C.UTF-8/LC_NUMERIC
20,24c23,24
< -rw-r--r-- 0/0             243 1970-01-01 01:00 ./usr/lib/locale/C.UTF-8/LC_IDENTIFICATION
< -rw-r--r-- 0/0              47 1970-01-01 01:00 ./usr/lib/locale/C.UTF-8/LC_TELEPHONE
< -rw-r--r-- 0/0             131 1970-01-01 01:00 ./usr/lib/locale/C.UTF-8/LC_ADDRESS
< -rw-r--r-- 0/0              23 1970-01-01 01:00 ./usr/lib/locale/C.UTF-8/LC_MEASUREMENT
< -rw-r--r-- 0/0         1515838 1970-01-01 01:00 ./usr/lib/locale/C.UTF-8/LC_COLLATE
---
> -rw-r--r-- 0/0              62 1970-01-01 01:00 ./usr/lib/locale/C.UTF-8/LC_NAME
> -rw-r--r-- 0/0              50 1970-01-01 01:00 ./usr/lib/locale/C.UTF-8/LC_NUMERIC
26c26,27
< drwxr-xr-x 0/0               0 1970-01-01 01:00 ./
---
> -rw-r--r-- 0/0              47 1970-01-01 01:00 ./usr/lib/locale/C.UTF-8/LC_TELEPHONE
> -rw-r--r-- 0/0            2498 1970-01-01 01:00 ./usr/lib/locale/C.UTF-8/LC_TIME
86d86
< drwxr-xr-x 0/0               0 1970-01-01 01:00 ./etc/

@PeterMylemans
Copy link
Contributor Author

PeterMylemans commented Aug 8, 2020

@chanseokoh looks good to me. I just had to fix 2 file permissions from 755 to 644 (see last 2 commits after your review).

Two layers will change (one in base and one in java base) as described in the comment above for future reference.

Copy link
Member

@chanseokoh chanseokoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PeterMylemans thanks for thoroughly checking the difference! Just FYI, changes in the parent directories in a tar may sometimes have a serious impact (for example, #456 (comment)), I don't think we have any issue here.

Thanks for making Distroless more robust! Merging this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes CLAs look good
Projects
None yet
4 participants