Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node images doesn't expose digests, which means locked pods can't get scheduling benefits #23917

Closed
smarterclayton opened this issue Apr 6, 2016 · 19 comments
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@smarterclayton
Copy link
Contributor

Anyone who wants reproducible deployments can use image digests/content-addressable-hash (image@sha256:abceouth). In Docker 1.10, all images locally will be identified by the CAH as well as their tags. In OpenShift, 80-90% of pods use the digest because we have the machinery to resolve to digest up front for guaranteed exact deployment.

The current implementation of node images does not expose image digests, so creating a pod that is using the digest gets no scheduling benefit. The node should return the list of digests for matching (if two images have different tags but the same digest, they are the same image under the covers and the scheduler should prefer putting a pod referencing that digest on the same node).

@smarterclayton
Copy link
Contributor Author

@kubernetes/rh-cluster-infra this means we get no benefit in practice from the current scheduling setup.

@smarterclayton
Copy link
Contributor Author

I think there are three changes needed:

  1. Add the digest to the node.status.images[].names object as a new name or new field
  2. The kubelet should identify terminal image names that have been or have recently been in use (since each layer is also an image, there's some risk that we'd show too many layers). In practice only tagged and terminal images (images not covered by other images) need to be returned. Images can be untagged but be in use.
  3. The scheduler predicate, looking to schedule a pod with name a/b@digest should match ANY name that ends in @digest.

@ncdc
Copy link
Member

ncdc commented Apr 6, 2016

With Docker 1.10 and beyond, my understanding is that the layer-is-image behavior is no longer available, and you can only run images. Maybe that isn't in 1.10 yet (I haven't checked), but if it isn't, it will be eventually.

I'm not sure exactly what you're getting at with terminal image names. Right now the node status contains the output from docker images as best I can tell (I haven't looked at the code).

As for 3, the first v2 image manifest format includes the image repository name and tag, so matching just by @digest is potentially appropriate. The second v2 format, however, removes name/tag from the manifest, meaning that its digest is actually portable across image repositories and tags, so I think it might be best to do a strict match against the full a/b@digest.

@smarterclayton
Copy link
Contributor Author

For terminal images I meant images that are only ever pulled by tag - if I
pull foo/bar@digest it is not "tagged" in 1.9 and thus doesn't show up. If
that has changed in docker 1.10, then that's good.

On Wed, Apr 6, 2016 at 11:22 AM, Andy Goldstein notifications@github.com
wrote:

With Docker 1.10 and beyond, my understanding is that the layer-is-image
behavior is no longer available, and you can only run images. Maybe that
isn't in 1.10 yet (I haven't checked), but if it isn't, it will be
eventually.

I'm not sure exactly what you're getting at with terminal image names.
Right now the node status contains the output from docker images as best
I can tell (I haven't looked at the code).

As for 3, the first v2 image manifest format includes the image repository
name and tag, so matching just by @digest is potentially appropriate. The
second v2 format, however, removes name/tag from the manifest, meaning that
its digest is actually portable across image repositories and tags, so I
think it might be best to do a strict match against the full a/b@digest.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#23917 (comment)

@ncdc
Copy link
Member

ncdc commented Apr 6, 2016

You can see images pulled exclusively by digest using docker images --digests.

@smarterclayton
Copy link
Contributor Author

Yes, those are the ones that need to be returned by the node.

@ncdc
Copy link
Member

ncdc commented Apr 6, 2016

Roger. So I would say start by including a/b@digest and having the scheduler do a strict match on the full pull spec. We can consider relaxing it later if needed.

@smarterclayton
Copy link
Contributor Author

The digest mapping still matters, since a pull wouldn't bring down new
layers. Why would we do strict matching?

On Wed, Apr 6, 2016 at 11:35 AM, Andy Goldstein notifications@github.com
wrote:

Roger. So I would say start by including a/b@digest and having the
scheduler do a strict match on the full pull spec. We can consider relaxing
it later if needed.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#23917 (comment)

@ncdc
Copy link
Member

ncdc commented Apr 6, 2016

If you have a node 1 with a/b@digest, node 2 with x/y@digest (same digest), and you're scheduling a pod with image a/b@digest, the relaxed match would presumably give equal weight/priority to both nodes, whereas a strict match would give higher priority to node 1. You're right that you wouldn't be pulling down new layers, but if the pod were scheduled to node 2, that node would be required to contact the registry to pull down the image in image repo a/b. It would be a fast series of HEAD requests, but still slightly more network traffic and processing than if you scheduled to node 1.

@smarterclayton
Copy link
Contributor Author

I would agree that a/b should be higher priority, but I would still want x/y to be a match. Wasn't sure when you said strict you meant "exclude x/y" which I think is too aggressive.

@ncdc
Copy link
Member

ncdc commented Apr 6, 2016

That works for me - check strict first, relaxed second.

@saad-ali saad-ali added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Apr 7, 2016
@ncdc
Copy link
Member

ncdc commented May 3, 2016

@smarterclayton the scheduler priority function compares container.Image to the names of the images in the node's status. I'm hesitant to put any logic to parse @sha256:... since that's Docker specific. WDYT?

@smarterclayton
Copy link
Contributor Author

I think the node should be abstracting that

On May 3, 2016, at 1:39 PM, Andy Goldstein notifications@github.com wrote:

@smarterclayton https://github.com/smarterclayton the scheduler priority
function compares container.Image to the names of the images in the node's
status. I'm hesitant to put any logic to parse @sha256:... since that's
Docker specific. WDYT?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#23917 (comment)

@ncdc
Copy link
Member

ncdc commented May 3, 2016

Are you suggesting that the ContainerImage that is in the node status have both Names and Digests?

@smarterclayton
Copy link
Contributor Author

Technically the digest is a name.

On May 3, 2016, at 2:02 PM, Andy Goldstein notifications@github.com wrote:

Are you suggesting that the ContainerImage that is in the node status have
both Names and Digests?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#23917 (comment)

@ncdc
Copy link
Member

ncdc commented May 3, 2016

True. Here are the options I can think of:

  1. Augment the list of names for an image in the node to include digests, including the repo; e.g. you would see busybox@sha256:abcd1234 in addition to busybox:latest for the same image
    1. No change to scheduler priority, so it requires an exact match on repo:tag or repo@digest
  2. Augment the list of names for an image in the node to include digest, excluding the repo; e.g. you would see sha256:abcd1234 in addition to busybox:latest for the same image
    1. No change to scheduler priority, but because the image name excludes the repo, this would result in loose matches where only the digest needs to match and the repo is ignored
  3. Modify the image data structure to include a separate field for digests (either with or without the repo)
    1. Modify the scheduler priority to look at the digest field too

My PR #25088 currently implements option 1

@smarterclayton
Copy link
Contributor Author

Digest name for Docker runtimes is a valid image target, so 1 or 2 seem ok
(name is vague in our spec, but equates to tag OR digest in docker).
Arguably 2 would give us better matching across the cluster (when tags
change but images don't), but changes the semantics of "name" (you can't
pull by digest).

We can potentially do a 1b option later - have the scheduler be a bit more
aware of digest.

On Tue, May 3, 2016 at 3:46 PM, Andy Goldstein notifications@github.com
wrote:

True. Here are the options I can think of:

  1. Augment the list of names for an image in the node to include
    digests, including the repo; e.g. you would see busybox@sha256:abcd1234
    in addition to busybox:latest for the same image
    1. No change to scheduler priority, so it requires an exact match
      on repo:tag or repo@digest
  2. Augment the list of names for an image in the node to include
    digest, excluding the repo; e.g. you would see sha256:abcd1234 in
    addition to busybox:latest for the same image
    1. No change to scheduler priority, but because the image name
      excludes the repo, this would result in loose matches where only the digest
      needs to match and the repo is ignored
  3. Modify the image data structure to include a separate field for
    digests (either with or without the repo)
    1. Modify the scheduler priority to look at the digest field too

My PR #25088 #25088
currently implements option 1


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#23917 (comment)

@ncdc
Copy link
Member

ncdc commented May 3, 2016

I'm going to mark the PR as a fix for this issue if that's ok?

@smarterclayton
Copy link
Contributor Author

I'm fine with it.

On Tue, May 3, 2016 at 4:36 PM, Andy Goldstein notifications@github.com
wrote:

I'm going to mark the PR as a fix for this issue if that's ok?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#23917 (comment)

k8s-github-robot pushed a commit that referenced this issue May 8, 2016
Automatic merge from submit-queue

Handle image digests in node status and image GC

Start including Docker image digests in the node status and consider image digests during image
garbage collection.

@kubernetes/rh-cluster-infra @kubernetes/sig-node @smarterclayton 

Fixes #23917
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

3 participants