Node images doesn't expose digests, which means locked pods can't get scheduling benefits #23917

smarterclayton · 2016-04-06T15:06:43Z

Anyone who wants reproducible deployments can use image digests/content-addressable-hash (image@sha256:abceouth). In Docker 1.10, all images locally will be identified by the CAH as well as their tags. In OpenShift, 80-90% of pods use the digest because we have the machinery to resolve to digest up front for guaranteed exact deployment.

The current implementation of node images does not expose image digests, so creating a pod that is using the digest gets no scheduling benefit. The node should return the list of digests for matching (if two images have different tags but the same digest, they are the same image under the covers and the scheduler should prefer putting a pod referencing that digest on the same node).

smarterclayton · 2016-04-06T15:07:35Z

@kubernetes/rh-cluster-infra this means we get no benefit in practice from the current scheduling setup.

smarterclayton · 2016-04-06T15:11:45Z

I think there are three changes needed:

Add the digest to the node.status.images[].names object as a new name or new field
The kubelet should identify terminal image names that have been or have recently been in use (since each layer is also an image, there's some risk that we'd show too many layers). In practice only tagged and terminal images (images not covered by other images) need to be returned. Images can be untagged but be in use.
The scheduler predicate, looking to schedule a pod with name a/b@digest should match ANY name that ends in @digest.

ncdc · 2016-04-06T15:22:07Z

With Docker 1.10 and beyond, my understanding is that the layer-is-image behavior is no longer available, and you can only run images. Maybe that isn't in 1.10 yet (I haven't checked), but if it isn't, it will be eventually.

I'm not sure exactly what you're getting at with terminal image names. Right now the node status contains the output from docker images as best I can tell (I haven't looked at the code).

As for 3, the first v2 image manifest format includes the image repository name and tag, so matching just by @digest is potentially appropriate. The second v2 format, however, removes name/tag from the manifest, meaning that its digest is actually portable across image repositories and tags, so I think it might be best to do a strict match against the full a/b@digest.

smarterclayton · 2016-04-06T15:24:42Z

For terminal images I meant images that are only ever pulled by tag - if I
pull foo/bar@digest it is not "tagged" in 1.9 and thus doesn't show up. If
that has changed in docker 1.10, then that's good.

On Wed, Apr 6, 2016 at 11:22 AM, Andy Goldstein notifications@github.com
wrote:

With Docker 1.10 and beyond, my understanding is that the layer-is-image
behavior is no longer available, and you can only run images. Maybe that
isn't in 1.10 yet (I haven't checked), but if it isn't, it will be
eventually.

I'm not sure exactly what you're getting at with terminal image names.
Right now the node status contains the output from docker images as best
I can tell (I haven't looked at the code).

As for 3, the first v2 image manifest format includes the image repository
name and tag, so matching just by @digest is potentially appropriate. The
second v2 format, however, removes name/tag from the manifest, meaning that
its digest is actually portable across image repositories and tags, so I
think it might be best to do a strict match against the full a/b@digest.

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#23917 (comment)

ncdc · 2016-04-06T15:25:12Z

You can see images pulled exclusively by digest using docker images --digests.

smarterclayton · 2016-04-06T15:27:52Z

Yes, those are the ones that need to be returned by the node.

ncdc · 2016-04-06T15:35:07Z

Roger. So I would say start by including a/b@digest and having the scheduler do a strict match on the full pull spec. We can consider relaxing it later if needed.

smarterclayton · 2016-04-06T15:48:44Z

The digest mapping still matters, since a pull wouldn't bring down new
layers. Why would we do strict matching?

On Wed, Apr 6, 2016 at 11:35 AM, Andy Goldstein notifications@github.com
wrote:

Roger. So I would say start by including a/b@digest and having the
scheduler do a strict match on the full pull spec. We can consider relaxing
it later if needed.

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#23917 (comment)

ncdc · 2016-04-06T15:52:48Z

If you have a node 1 with a/b@digest, node 2 with x/y@digest (same digest), and you're scheduling a pod with image a/b@digest, the relaxed match would presumably give equal weight/priority to both nodes, whereas a strict match would give higher priority to node 1. You're right that you wouldn't be pulling down new layers, but if the pod were scheduled to node 2, that node would be required to contact the registry to pull down the image in image repo a/b. It would be a fast series of HEAD requests, but still slightly more network traffic and processing than if you scheduled to node 1.

smarterclayton · 2016-04-06T15:59:06Z

I would agree that a/b should be higher priority, but I would still want x/y to be a match. Wasn't sure when you said strict you meant "exclude x/y" which I think is too aggressive.

ncdc · 2016-04-06T16:11:13Z

That works for me - check strict first, relaxed second.

ncdc · 2016-05-03T17:35:32Z

@smarterclayton the scheduler priority function compares container.Image to the names of the images in the node's status. I'm hesitant to put any logic to parse @sha256:... since that's Docker specific. WDYT?

smarterclayton · 2016-05-03T18:00:00Z

I think the node should be abstracting that

On May 3, 2016, at 1:39 PM, Andy Goldstein notifications@github.com wrote:

@smarterclayton https://github.com/smarterclayton the scheduler priority
function compares container.Image to the names of the images in the node's
status. I'm hesitant to put any logic to parse @sha256:... since that's
Docker specific. WDYT?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#23917 (comment)

ncdc · 2016-05-03T18:02:11Z

Are you suggesting that the ContainerImage that is in the node status have both Names and Digests?

smarterclayton · 2016-05-03T19:19:25Z

Technically the digest is a name.

On May 3, 2016, at 2:02 PM, Andy Goldstein notifications@github.com wrote:

Are you suggesting that the ContainerImage that is in the node status have
both Names and Digests?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#23917 (comment)

ncdc · 2016-05-03T19:46:04Z

True. Here are the options I can think of:

Augment the list of names for an image in the node to include digests, including the repo; e.g. you would see busybox@sha256:abcd1234 in addition to busybox:latest for the same image
1. No change to scheduler priority, so it requires an exact match on repo:tag or repo@digest
Augment the list of names for an image in the node to include digest, excluding the repo; e.g. you would see sha256:abcd1234 in addition to busybox:latest for the same image
1. No change to scheduler priority, but because the image name excludes the repo, this would result in loose matches where only the digest needs to match and the repo is ignored
Modify the image data structure to include a separate field for digests (either with or without the repo)
1. Modify the scheduler priority to look at the digest field too

My PR #25088 currently implements option 1

smarterclayton · 2016-05-03T20:31:42Z

Digest name for Docker runtimes is a valid image target, so 1 or 2 seem ok
(name is vague in our spec, but equates to tag OR digest in docker).
Arguably 2 would give us better matching across the cluster (when tags
change but images don't), but changes the semantics of "name" (you can't
pull by digest).

We can potentially do a 1b option later - have the scheduler be a bit more
aware of digest.

On Tue, May 3, 2016 at 3:46 PM, Andy Goldstein notifications@github.com
wrote:

True. Here are the options I can think of:

Augment the list of names for an image in the node to include
digests, including the repo; e.g. you would see busybox@sha256:abcd1234
in addition to busybox:latest for the same image

No change to scheduler priority, so it requires an exact match
on repo:tag or repo@digest

Augment the list of names for an image in the node to include
digest, excluding the repo; e.g. you would see sha256:abcd1234 in
addition to busybox:latest for the same image

No change to scheduler priority, but because the image name
excludes the repo, this would result in loose matches where only the digest
needs to match and the repo is ignored

Modify the image data structure to include a separate field for
digests (either with or without the repo)

Modify the scheduler priority to look at the digest field too

My PR #25088 #25088
currently implements option 1

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#23917 (comment)

ncdc · 2016-05-03T20:36:27Z

I'm going to mark the PR as a fix for this issue if that's ok?

smarterclayton · 2016-05-03T21:11:50Z

I'm fine with it.

On Tue, May 3, 2016 at 4:36 PM, Andy Goldstein notifications@github.com
wrote:

I'm going to mark the PR as a fix for this issue if that's ok?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#23917 (comment)

@smarterclayton

Automatic merge from submit-queue Handle image digests in node status and image GC Start including Docker image digests in the node status and consider image digests during image garbage collection. @kubernetes/rh-cluster-infra @kubernetes/sig-node @smarterclayton Fixes #23917

smarterclayton mentioned this issue Apr 6, 2016

Expose images list of a node #18248

Merged

saad-ali added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Apr 7, 2016

ncdc mentioned this issue May 3, 2016

Handle image digests in node status and image GC #25088

Merged

k8s-github-robot closed this as completed in #25088 May 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node images doesn't expose digests, which means locked pods can't get scheduling benefits #23917

Node images doesn't expose digests, which means locked pods can't get scheduling benefits #23917

smarterclayton commented Apr 6, 2016

smarterclayton commented Apr 6, 2016

smarterclayton commented Apr 6, 2016

ncdc commented Apr 6, 2016

smarterclayton commented Apr 6, 2016

ncdc commented Apr 6, 2016

smarterclayton commented Apr 6, 2016

ncdc commented Apr 6, 2016

smarterclayton commented Apr 6, 2016

ncdc commented Apr 6, 2016

smarterclayton commented Apr 6, 2016

ncdc commented Apr 6, 2016

ncdc commented May 3, 2016

smarterclayton commented May 3, 2016

ncdc commented May 3, 2016

smarterclayton commented May 3, 2016

ncdc commented May 3, 2016

smarterclayton commented May 3, 2016

ncdc commented May 3, 2016

smarterclayton commented May 3, 2016

Node images doesn't expose digests, which means locked pods can't get scheduling benefits #23917

Node images doesn't expose digests, which means locked pods can't get scheduling benefits #23917

Comments

smarterclayton commented Apr 6, 2016

smarterclayton commented Apr 6, 2016

smarterclayton commented Apr 6, 2016

ncdc commented Apr 6, 2016

smarterclayton commented Apr 6, 2016

ncdc commented Apr 6, 2016

smarterclayton commented Apr 6, 2016

ncdc commented Apr 6, 2016

smarterclayton commented Apr 6, 2016

ncdc commented Apr 6, 2016

smarterclayton commented Apr 6, 2016

ncdc commented Apr 6, 2016

ncdc commented May 3, 2016

smarterclayton commented May 3, 2016

ncdc commented May 3, 2016

smarterclayton commented May 3, 2016

ncdc commented May 3, 2016

smarterclayton commented May 3, 2016

ncdc commented May 3, 2016

smarterclayton commented May 3, 2016