Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize container image representation #8702

Open
bgrant0607 opened this issue May 22, 2015 · 26 comments
Open

Generalize container image representation #8702

bgrant0607 opened this issue May 22, 2015 · 26 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@bgrant0607
Copy link
Member

Forked from #7018 and #7203.

We need to generalize the representation of container images so that we can support more image formats, pinning to specific image hashes, references to image API objects, and other features in the future. Generalizing this representation breaks ALL API versions, so we should do that now.

The proposed format is here: #7203 (comment)

It follows the approach of VolumeSource.

    ContainerImage json:",inline"
...
type ContainerImage struct {
    DockerImage *DockerImage
}
type DockerImage struct {
    Name string // "repository" in v1 registry speak
    Tag string
}

@bakins @philips @smarterclayton @yifan-gu

@bgrant0607 bgrant0607 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels May 22, 2015
@bgrant0607 bgrant0607 added this to the v1.0-candidate milestone May 22, 2015
@smarterclayton
Copy link
Contributor

One note - with the addition of the digest syntax it's tag or digest. There may be value (speaking from experience) in treating it as an opaque value in the API (DockerImage -> name) vs trying to split out tag and digest and copy them. That avoids the "is empty tag 'latest' or not" question.

On May 22, 2015, at 5:35 PM, Brian Grant notifications@github.com wrote:

Forked from #7018 and #7203.

We need to generalize the representation of container images so that we can support more image formats, pinning to specific image hashes, references to image API objects, and other features in the future. Generalizing this representation breaks ALL API versions, so we should do that now.

The proposed format is here: #7203 (comment)

It follows the approach of VolumeSource.

ContainerImage json:",inline"

...
type ContainerImage struct {
DockerImage *DockerImage
}
type DockerImage struct {
Name string // "repository" in v1 registry speak
Tag string
}
@bakins @philips @smarterclayton @yifan-gu


Reply to this email directly or view it on GitHub.

@bgrant0607
Copy link
Member Author

One of my API Rules is that we shouldn't munge fields set by the user/client. One benefit of a separate field for digest is that it could be filled in asynchronously (by defaulting, admission control, or late initialization) and favored over the tag if present -- the tag would remain for informational purposes, which is useful since the digest isn't human-friendly.

@bgrant0607
Copy link
Member Author

We could also populate tag with "latest" by default to eliminate ambiguity about that.

@smarterclayton
Copy link
Contributor

Although at some point that field is allegedly going away. We definitely regretted decomposing the fields in v1beta1 and went to pure physical and virtual (kind not represented in scheme) object references.

On May 22, 2015, at 6:47 PM, Brian Grant notifications@github.com wrote:

We could also populate tag with "latest" by default to eliminate ambiguity about that.


Reply to this email directly or view it on GitHub.

@bgrant0607
Copy link
Member Author

@smarterclayton Are you suggesting we leave the image string as is, or that we just don't break out tag from name, or do you have another concrete suggestion?

@yifan-gu
Copy link
Contributor

@smarterclayton Could you elaborate what is the bad experience for separating the tags?

@smarterclayton
Copy link
Contributor

Generally not mutate/parse the string at all for Docker and treat it as opaque - if we have a more structured format eventually (URL form) support that via some separate field / struct. My concern is that we are generally dealing with an opaque value (what the Docker API expects) which is insufficiently specified at this time. It did not seem we had a strong reason to parse the value except as a benefit to API consumers - and they already typically deal with the value in an opaque fashion (go-dockerclient).

If we require consumers to split the value themselves that requires them to also implement the same logic as and as Docker. For instance, someone writing a python client of Kube that wants to insert a string image value they get from the Docker API has to parse that value them self, pass it to us, and then we likely reconstruct it back to its original form to pass to adocker.

For Rocket, a structured value is more logical because it's guided by an explicit specification.

On May 22, 2015, at 8:20 PM, Brian Grant notifications@github.com wrote:

@smarterclayton Are you suggesting we leave the image string as is, or that we just don't break out tag from name, or do you have another concrete suggestion?


Reply to this email directly or view it on GitHub.

@smarterclayton
Copy link
Contributor

The Docker 1.6 API and client accept "172.30.180.175:5000/myproject/origin-ruby-sample@sha256:82f39128a0705eba6964bad1e1e548ec72b59e8428914e0001479bdb6211a051" which looks like a tag if you use one parsing algorithm, but not if you use the other. We're currently in OpenShift and Kube using the direct docker parsers pkg

// Get a repos name and returns the right reposName + tag|digest
// The tag can be confusing because of a port in a repository name.
//     Ex: localhost.localdomain:5000/samalba/hipache:latest
//     Digest ex: localhost:5000/foo/bar@sha256:bc8813ea7b3603864987522f02a76101c17ad122e1c46d790efc0fca78ca7bfb
func ParseRepositoryTag(repos string) (string, string) {
    n := strings.Index(repos, "@")
    if n >= 0 {
        parts := strings.Split(repos, "@")
        return parts[0], parts[1]
    }
    n = strings.LastIndex(repos, ":")
    if n < 0 {
        return repos, ""
    }
    if tag := repos[n+1:]; !strings.Contains(tag, "/") {
        return repos[:n], tag
    }
    return repos, ""
}

but it makes me pretty uncomfortable to tell clients they also need to implement that.

@bgrant0607
Copy link
Member Author

Ok, so the problem is the client spec, not the API. There is a server, repo (namespace + name), and tag or digest, but that's encoded in an unfortunate way.

Anyway, the most important part of this change is making it possible to add other specs later. The minimal thing we'd need to do is rename the image field to dockerImage and change it from required to optional. I'd prefer to also add the struct so that we could add fields to dockerImage, though. That would look like:

type DockerImage struct {
    Spec string
}

@yifan-gu
Copy link
Contributor

@smarterclayton As @bgrant0607 pointed out #8702 (comment) Separating the fields sounds more like the future to me. IMHO, this looks cleaner in the API.

But I am OK with just a string for the first iteration to make the change smoother.

@dchen1107
Copy link
Member

cc/ myself

@bgrant0607
Copy link
Member Author

Sorry, but we don't currently have consensus on this change, and we've run out of time to debate/bikeshed the API change. We'll need to revisit in the v2 API.

One suggestion was to serialize new representations into the image string field in older APIs. Problems with that include:

  1. Clients expecting to see a Docker image in that field would be confused/broken by a format not conforming to Docker's spec.
  2. Fields populated by default, such as an image hash, would obstruct diff/merge/patch using old API versions using the string representation. This also means we'd need to preserve strings provided via older API versions in newer API versions so that we wouldn't mutate the strings and break users' declarative config files.
  3. Serialized object references in field values don't get converted by our decoding/encoding mechanism.

cc @thockin

@bgrant0607
Copy link
Member Author

For completeness, we'll be serializing the state using the v1 API and Kubelet will pull pods from apiserver via the v1 API for the foreseeable future, so it will need an encoding in v1 that doesn't lose information.

I don't see a future where we don't need to introduce a new field or fields for new formats, references, etc. We should document that the current image field is optional (i.e., add omitempty) so that we can at least point to that when we break v1 clients after 1.0.

@thockin
Copy link
Member

thockin commented May 27, 2015

Why can't we add a NewImage ImageStruct field alongside Image? Make it
so that one or the other is required in v1b3 and that v1 has only Image ImageStruct. Make the default stringification be docker-compatible
whenever possible (e.g. drop docker://). Don't stringify auto-populated
fields (may require carrying ugly info as to what use specified). I still
have hope that we can get it compatible as an addon...

On Tue, May 26, 2015 at 5:01 PM, Brian Grant notifications@github.com
wrote:

Sorry, but we don't currently have consensus on this change, and we've run
out of time to debate/bikeshed the API change. We'll need to revisit in the
v2 API.

One suggestion was to serialize new representations into the image string
field in older APIs. Problems with that include:

  1. Clients expecting to see a Docker image in that field would be
    confused/broken by a format not conforming to Docker's spec.
  2. Fields populated by default, such as an image hash, would obstruct
    diff/merge/patch using old API versions using the string representation.
    This also means we'd need to preserve strings provided via older API
    versions in newer API versions so that we wouldn't mutate the strings and
    break users' declarative config files.
  3. Serialized object references in field values don't get converted by our
    decoding/encoding mechanism.

cc @thockin https://github.com/thockin


Reply to this email directly or view it on GitHub
#8702 (comment)
.

@thockin
Copy link
Member

thockin commented May 27, 2015

I am not saying it's going to be pleasant, but if we can't figure out how
to make changes like this, what good is all our API versioinig?

On Tue, May 26, 2015 at 6:05 PM, Brian Grant notifications@github.com
wrote:

For completeness, we'll be serializing the state using the v1 API and
Kubelet will pull pods from apiserver via the v1 API for the foreseeable
future, so it will need an encoding in v1 that doesn't lose information.

I don't see a future where we don't need to introduce a new field or
fields for new formats, references, etc. We should document that the
current image field is optional (i.e., add omitempty) so that we can at
least point to that when we break v1 clients after 1.0.


Reply to this email directly or view it on GitHub
#8702 (comment)
.

@yifan-gu
Copy link
Contributor

FWIW, In #8839 I am replacing the string with struct in v1 and the internal representation, I keep the string as they are for all other versions. I am not expecting beta1, beta2, beta3 to run things other than docker image.
Also since there is no consensus on the docker image struct, so I just implement with a string as @bgrant0607 suggested #8702 (comment)

@thockin
Copy link
Member

thockin commented May 27, 2015

The concern is whether we have the bandwidth to take this change before 1.0
(it touches a lot of stuff) or whether it has to wait until after 1.0, in
which case it has to be strictly compatible.

On Tue, May 26, 2015 at 6:23 PM, Yifan Gu notifications@github.com wrote:

FWIW, In #8839
#8839 I am
replacing the string with struct in v1 and the internal representation, I
keep the string as they are for all other versions. I am not expecting
beta1, beta2, beta3 to run things other than docker image.
Also since there is no consensus on the docker image struct, so I just
implement with a string as @bgrant0607 https://github.com/bgrant0607
suggested #8702 (comment)
#8702 (comment)


Reply to this email directly or view it on GitHub
#8702 (comment)
.

@bgrant0607
Copy link
Member Author

@thockin Our API versioning has handled the vast majority of API changes so far, but there are scenarios where it doesn't -- where newer extensions subsuming existing functionality are not representable in older API versions, especially where the fields were required. See also the selector API changes as an example.

@bgrant0607
Copy link
Member Author

@thockin Your proposal is essentially what's proposed above, which was punted out of 1.0, and in #8839, which was submitted today.

@bgrant0607
Copy link
Member Author

We're going to make the image field officially optional in v1, by tagging the field with "omitempty". In practice, it will be required until another image field is added.

@bgrant0607
Copy link
Member Author

cc @caesarxuchao

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 31, 2017
@bgrant0607
Copy link
Member Author

/remove-lifecycle stale
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 23, 2018
@pacoxu
Copy link
Member

pacoxu commented Jun 25, 2021

// ContainerImage describe a container image
type ContainerImage struct {
// Names by which this image is known.
// +optional
Names []string
// The size of the image in bytes.
// +optional
SizeBytes int64
}

A recent PR by @roycaihw : Mark ContainerImage.names as optional #102159 which seem to be related.

Could someone take a look if this issue needs to be working on further?

@mmiranda96
Copy link
Contributor

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests