Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker pull "dry run" mode #16450

Closed
wants to merge 9 commits into from
Closed

Conversation

ygravrand
Copy link

This is our contribution to Docker Global Hack Day 3.
It adds the ability to "dry run" a "docker pull" command, obtaining only the total size of image layers that have to be downloaded.
A new -d flag activates this mode:

docker pull -d jenkins
Using default tag: latest
INFO[0005] POST /v1.21/images/create?dryRun=true&fromImage=jenkins%3Alatest 
INFO[0008] Image manifest for jenkins:latest has been verified 
**** Dry Run - nothing will be downloaded ****
latest: Pulling from library/jenkins
latest: Dry Run: 376344319 bytes to be downloaded, in 33 layers
Status: Dry Run completed for jenkins:latest

At the end of the Hackaton, we found this related Feature request: #1512. Although it's labelled "docker pull info", a comment by @unclejack states that a flag approach would be preferred.
So it seems we're in the right direction with this.

Some limitations:

  • Currently only works with v2 registry.
  • Does not work if a background pull is already running on the same image (but displays a nice warning ;-) )
  • Please review the REST API and CLI parts to be sure we did not break anything.

srevereault and others added 5 commits September 21, 2015 12:55
Signed-off-by: Sylvain Révéreault <sylvain@revereault.fr>
Signed-off-by: Sylvain Révéreault <sylvain@revereault.fr>
Signed-off-by: Sylvain Révéreault <sylvain@revereault.fr>
Signed-off-by: Yann Gravrand <y.gravrand@gmail.com>
Signed-off-by: Yann Gravrand <y.gravrand@gmail.com>
Signed-off-by: Yann Gravrand <y.gravrand@gmail.com>
@@ -112,7 +113,7 @@ func (s *Server) postImagesCreate(ctx context.Context, w http.ResponseWriter, r
OutStream: output,
}

err = s.daemon.Repositories().Pull(image, tag, imagePullConfig)
err = s.daemon.Repositories().Pull(image, tag, imagePullConfig, dryRun != "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would turn on the "dryRun" mode even if dryRun was set to false in the request.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronlehmann Seems that the CLI does not send the "dryRun=" parameter when set to false.
Do you see a possible improvement here though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI won't send it, but I could see someone using the API directly inserting dryRun=false.

...which reminds me that this parameter should be documented in docs/reference/api/docker_remote_api.md.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Will change the test and update the docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a helper function for this, boolValue

Signed-off-by: Yann Gravrand <y.gravrand@gmail.com>
…n normal mode

Signed-off-by: Yann Gravrand <y.gravrand@gmail.com>
@ygravrand
Copy link
Author

@aaronlehmann Thanks for your review! Can you check it out again?

@@ -76,6 +76,9 @@ func (p *v2Puller) pullV2Repository(tag string) (err error) {
broadcaster.Add(p.config.OutStream)
if found {
// Another pull of the same repository is already taking place; just wait for it to finish
if dryRun {
broadcaster.Write(p.sf.FormatStatus(tag, "!!! Another pull of the same image is already running, dry-run is not possible unless you restart the Docker daemon !!!"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should write to p.config.OutStream rather than broadcaster; otherwise the message will go to all clients that are watching this pull.

I'd also suggest changing the message to remove the !!! characters, and also remove "unless you restart the Docker daemon" (because the user only needs to wait for the pull to finish).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, consider what happens if someone tries to do docker pull myimage and docker pull -dry-run myimage at the same time. If the dry-run pull starts first, the normal pull may attach to that broadcaster, and the pull will never happen.

In dry-run mode, this function should probably avoid calling poolAdd and poolRemoveWithError at all, to avoid being mistaken for an actual pull. It could pass p.config.OutStream to pullV2Tag in this case, instead of broadcaster.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments.
Our client requesting a pull in "dry-mode" shouldn't affect other pulls.
Will make changes as soon as I can.

@aaronlehmann
Copy link
Contributor

Thanks for making the changes so far. I added a few more comments.

@@ -17,6 +17,7 @@ import (
func (cli *DockerCli) CmdPull(args ...string) error {
cmd := Cli.Subcmd("pull", []string{"NAME[:TAG|@DIGEST]"}, "Pull an image or a repository from a registry", true)
allTags := cmd.Bool([]string{"a", "-all-tags"}, false, "Download all tagged images in the repository")
dryRun := cmd.Bool([]string{"d", "-dry-run"}, false, "Dry run mode - only displays the download size")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not use the -d option here.
Generally I would associate -d as detach for something like this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpuguy83 What do you suggest? -n as in #1512 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-n would the correct param for a "Dry run".

@cpuguy83
Copy link
Member

In general 👎 for a bool argument, and especially adding it to the arguments for pull, which are already massive. I think implementing this as a different puller implementation would be better than adding to the real one.

I also feel like this is getting pretty close to hitting this: https://github.com/docker/docker/blob/611dbd8957581fa451a4103259100a5e2d115b8c/ROADMAP.md#23-remote-registry-operations

ping @stevvooe

@cpuguy83
Copy link
Member

Also, this is pretty cool, I know a lot of people would like to be able to get at this sort of information.

@ygravrand
Copy link
Author

@cpuguy83 Thanks!
By a different puller implementation, do you mean another command? If not, how can the end user switch implementations to get the dry-run one?
Also there didn't seem to be consensus on "flag" or "new command" in #1512.

Signed-off-by: Yann Gravrand <y.gravrand@gmail.com>
@cpuguy83
Copy link
Member

@ygravrand I mean a new implementation for the Puller interface rather than augmenting the existing puller implementation with a new bool.
So basically if docker pull --dry-run is set, The NewPuller function returns the dry-run implementation rather than the real one (just an example, not sure that's necessarily the best place for it).

But still, see the link I shared wrt to the moratorium on code changes to code that deals with the registry.

@@ -112,7 +112,7 @@ func (s *TagStore) Pull(image string, tag string, imagePullConfig *ImagePullConf
lastErr = err
continue
}
if fallback, err := puller.Pull(tag); err != nil {
if fallback, err := puller.Pull(tag, dryRun); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make a lot of sense to have an interface with a dryRun parameter. Just change the implementation of the interface to make it not take an action.

The main problem with this is that one must always consider how dryRun is interpreted when making additions to the implementation. If an individual is fixing a bug in an implementation and is not aware of this interface, we risk having unprotected mutations.

The right approach is to have a separate implementation of puller resolved at instantiation time that implements puller. If we have to parameterize on deeper methods of puller, we should provide interfaces that can be injected further down and do not provide a no-op.

@stevvooe
Copy link
Contributor

First, I'll say this is a much needed and valuable feature. Being able to see what actions a system might take based on the input is immensely and contributes directly to better usability. Albeit, dry run does not really address the request of #1512, which would be better served by a docker remote command, allowing the exploration of remote.

As @cpuguy83 stated, this definitely runs afoul of https://github.com/docker/docker/blob/611dbd8957581fa451a4103259100a5e2d115b8c/ROADMAP.md#23-remote-registry-operations. However, that moratorium has been enacted to ensure that use cases such as dry run are trivially implementable. The main problem plaguing this PR is that there are if statements speckled throughout the code, demonstrating the need for the work that needs to occur before we lift the moratorium. The intent here is not to prevent this kind of feature but rather to enable safe implementation.

Put more succinctly, this PR is so hard to implement because we have not addressed the issues driving the moratorium. I apologize that you may have lost time implementing something we have to say no to, but the hard problems need to be solved here before we continue to add features.

@ygravrand
Copy link
Author

@stevvooe @cpuguy83 Thanks for the review. I agree with you a 100% about the "scattered if statements" ugliness of the submitted code (Go and codebase newcomers :) ). A separate implementation of puller seems far better.
If you're interested we could still try and write such an implementation, even if it doesn't make it past the moratorium, just to see how it looks.

@srevereault
Copy link

I totally agree with @ygravrand : this was our first try at adding features to Docker's codebase, and we almost discovered Go while doing it (during the Docker Global Hack Day), but as the feature seems to interest more people than the two of us, we'll be glad to implement it "the right way" (and we certainly will need guidance for that ;-) ).

@stevvooe
Copy link
Contributor

@ygravrand @srevereault If you're up for it that would be great but I don't want you to have a poor experience from the current state of the codebase (it is both moving and complex). Albeit, one may learn quite a bit.

For this case, a puller implementation may not go far enough. Really, we want to be able to which operations would be carried out based on the current. This interface would both have to support querying what is present and what needs to be downloaded. I put a lot of effort in defining this within https://godoc.org/github.com/docker/distribution but the daemon doesn't map directly to these concepts, per se.

Really, we need to have methods that encode higher-level operations like "pull layer" and "pull manifest". The key to this is having some reference types from distribution/distribution#963 that can address individual components of a pull. The addition of that type really makes this a lot easier, since you don't have to keep adjusting the level of arguments based on the resource. It makes the puller recursive.

From that, there are other details that may come into play. One problem that commonly comes up is querying and managing download state.

I know features are the glamorous part, but there is a lot of core work that could really help move us forward. If you're still game, feel free to contact me via the mailing list or IRC.

@LK4D4
Copy link
Contributor

LK4D4 commented Oct 23, 2015

@ygravrand @srevereault @stevvooe where we here?

@stevvooe
Copy link
Contributor

@LK4D4 This is a needed feature but I'm not sure this design is the right approach. The reference package is in place, so making refactoring the puller should be ongoing.

@ygravrand @srevereault Are you planning to take this further?

@srevereault
Copy link

@stevvooe yes, still planning to work on this, but I couldn't find the time in the last month. I will probably need guidance on this... I suppose looking at distribution/distribution#963 would be a good start ?

@aaronlehmann
Copy link
Contributor

I would recommend holding off on this. The pull and push code has some major changes in the works for 1.10. It would be a lot easier to wait for these changes to land and then build on top of them.

@icecrime
Copy link
Contributor

I agree with @aaronlehmann. As much as I appreciate this feature, I think it's not reasonable to merge right now.

I'm closing this until we can figure it out, which doesn't mean we don't want this, but that we cannot currently merge it. Thank you again for the great contribution @ygravrand: I suggest you open an issue so we can keep track of this, and hopefully revive the subject as soon as the ongoing refactoring work lands.

@icecrime icecrime closed this Oct 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants