Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: One Meta Data to Rule Them All => Labels #9882

Merged
merged 7 commits into from
Mar 17, 2015

Conversation

ibuildthecloud
Copy link
Contributor

This is yet another meta data PR in an attempt to pull together multiple PRs hopefully into something we can get into Docker 1.5 (seriously, let's move fast).

Background

Some background... (from what I can gather). There is currently #8955 for adding UserData to a Dockerfile. Basically it adds the USERDATA key=value key="long value" syntax to the Docker. Then there is #9013 which looks to add structured JSON data to a Dockerfile and a container in much the same fashion as Kubernete's annotations. Finally there is #9854 which discusses the need for meta data on container to help with Swarm and similar clustering systems. There are probably another 10 threads that @thaJeztah can find that also talk about the need to add meta data to containers.

We need this...

We need meta data, it's clear user want it.

Labels

We already have labels on Hosts (Docker daemon) today. It seems that going forward we should be able to add labels to everything: Hosts, containers, volumes, images, etc. Let's just continue with that approach. Labels are simple key/value pairs in the style of map[string]string. There have been comments to standarize the format of the keys such that we have namespaces or other means to prevent conflicts. Honestly I don't think that is all that necesary at the Docker level to dictate this. It's just good practice to namespace things. Whether you do "foo_bar" or "foo/bar" or "com.foo/bar", who really cares. If your going to use "id=3", well that's a bad name and somebody may clobber your value. I'm assuming some will disagree with me on this one. That's fine, I'll agree to a namespace standard just as long as it doesn't take three weeks and 132 comments to decide that DNS format is far superior to arbitrary strings split by "/".

Labels are not structured data, and as such this PR is different from #9013, so that discussion can happen differently. Honestly, I'm not in favor of adding structured data to objects and having Docker maintain it. But if others disagree, so be it, we can have structured data as something else.

What about ENV vars

Yes, labels are very close to environment variables. You can add ENV to a Dockerfile and you can add them at container create. The basic difference here is that these key/values are not visible to the running processes in the container.

Lookup by Label

Another key attribute of labels is that you should be able to find an object based on it's label. This should initially be kept very simple. You can either say "give me all images/containers that have key foo." Or you can say "give me all images/containers that have foo=bar".

How should we go about doing this?

I think #8955 is the right start. USERDATA should be renamed to LABEL. Now that we have LABEL on images we need to add --label key=value to docker run/create. This is in the same fashion as ENV and -e today.

Now the only thing left do to is to figure out how to query based on labels. We just need to add --label foo or --label foo=bar to docker images/ps.

It's just that simple folks

Okay, good? Alright, let's move forward...

@thaJeztah
Copy link
Member

Thanks, Darren!

Honestly I don't think that is all that necesary at the Docker level to dictate this. It's just good practice to namespace things. Whether you do "foo_bar" or "foo/bar" or "com.foo/bar", who really cares.

Fully agree on this one. Docker should offer the means to store, search and retrieve the data, but have no opinion on what they are used for, or what (naming)conventions are used. If Docker itself is using meta-data for something, that is just an implementation, just like any other system using the meta-data.

The data stored in a label is just free-form text as well; if an implementation decides to use it for storing JSON, that's fine, but Docker doesn't offer special treatment for those values; no parsing, validation or nested search for JSON properties.

Indexing / performance

To be useful (for example, fetch a container via a "custom" id), querying meta-data should be fast. Useful indexes should probably be present, including "partial" matches or wild-card support, both on "keys" and "values". For example, getting all containers that have a labels with namespace/prefix starting with my.name.space.*

Scope / Visibility

We should probably ask if meta data is only accessible from "outside" containers (in case of meta-data on containers), or also from within a container; I can see use-cases where meta-data can be useful inside a container. How to control access is something to be discussed (also wrt read/read-write)

Inheritance

In case of Image and Container meta-data; should images share the same meta-data as containers running from it? Will they be inherited, but kept separate? Or are they "merged" when creating a container instance?

@phemmer
Copy link
Contributor

phemmer commented Jan 3, 2015

I also agree that docker should not dictate how the data gets used or formatted. I personally think docker tries to dictate things a bit too often. We should recommend a best practice, but if someone wants to ignore best practice, they might have a good reason for it.

However I don't like the term 'label'. Most systems I've ever dealt with have treated a label as value-only data, not key/value (one example being github labels). Just to clarify what I mean, a label would be something like --label foo_bar, where there is no =, and thus no presence of a key.

@jessfraz
Copy link
Contributor

jessfraz commented Jan 5, 2015

I like this FWIW

@erikh
Copy link
Contributor

erikh commented Jan 5, 2015

Paging @vieux and @aluzzardi, as this may be very relevant to their interests.

I also think it's a really good idea. Chef search is very similar and one of its best features.

@ibuildthecloud
Copy link
Contributor Author

I want to make it clear that my intention is to quickly move this forward. I want to find what we can implement now that will give the minimally viable value but also put us on clear path to adding more functionality.

@thaJeztah - Comments below

Indexing/Performance

I completely agree that fast lookup is required. That is one of the fundamental differences between environment variables and labels. For containers I think this can be easily achieved by just keeping the labels in memory in the current data structures. Searching for a label will just be iterating over the list of containers in memory. This means at it’s worse searching for a label will be as slow as docker ps. I think this is sufficient at first. We can improve down the line.

Images become more difficult as you have more images than containers, typically. For images a real index should be built. The problem though is that docker networks are coming soon and volumes will probably not be too far. It seems we should find one consistent approach for labels that works for all object types. As such, I would like to defer on search for images by tag. I’ve currently seen a higher demand for fast lookup of containers, but not the same for images. I’m not saying the use cases don’t exist, just that containers are a higher priority.

The way in which one can search is largely based on the underly index. So supporting wildcard, regexp, etc. has real technical implications. I think we need a good query syntax, but for the first pass I think it is safe to support “give me all containers that have label foo” and “give me all containers that have label foo equal to bar”. The syntax would be docker ps --label foo and docker ps --label foo=bar respectively.

Scope/Visibility

First off, scope visibility doesn’t really matter until we have an introspection service. So this is obviously a discussion that will happen elsewhere, but regardless I’ll say what I think it should be. Labels should follow the existing pattern of ports. That being that they are private by default and must be explicitly published. Defining a label is the same as EXPOSE in the Dockerfile and then publishing the label should follow a pattern like -P and -p .... This means all labels will not be seen by the container unless you explicitly --publish-all-labels or --publish-label foo. It seems a wildcard syntax should exist like --publish-label com.example/*.

Inheritance

I hope you notice a trend in my comments in that we should just follow existing patterns. For inheritance I would expect labels follow the same approach as environment variables. I honestly haven’t given a huge amount of thought to this, but I think the ENV approach should be sufficient.

@phemmer I completely agree that label is a bad term. Unfortunately the precedence has already been set with host labels and Kubernetes labels. I have a strong opinion that it's better to be consistently wrong then inconsistently right. I think we should just stick with the ill named “label.”

Implementation

I have every intention of pushing this through as fast as possible. I’m going to code the implementation of this hopefully today based off of @rhatdan’s existing work in #8955. I'm optimistic the community can come to a consensus.

@dnephin
Copy link
Member

dnephin commented Jan 6, 2015

Labels would be awesome for fig/docker compose. Tracking images with tags would allow users to use any name for their images and containers, and would address some of the performance issues with the current version.

Inheritance

It seems to me like inheritance could be entirely client-side. A client which is creating a container from an image should be able to decide which labels to copy over to the container.

@@ -1,3 +1,5 @@
Okay, I'm lazy. I'll write docs if I think this PR doesn't get violently opposed
to at first...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

giggle - you'll want to remove this >:}

@phemmer
Copy link
Contributor

phemmer commented Jan 6, 2015

Do we ever anticipate labels being added/removed on an already existing container? If so, storing them in the config data precludes this possibility.

@aanand
Copy link
Contributor

aanand commented Jan 6, 2015

I agree with pretty much everything in this proposal - it's basically the one I would've written. As @dnephin says, Compose will be a primary consumer of this feature.

@bfirsh bfirsh added the UX label Jan 6, 2015
@rhatdan
Copy link
Contributor

rhatdan commented Jan 6, 2015

I can go along with this, just as long as something finally gets merged to allow us to add Meta Data. Even if we can just settle on a name Meta->UserData->Label...

@ibuildthecloud ibuildthecloud force-pushed the labels branch 4 times, most recently from 7bcc076 to 7db7ad1 Compare January 7, 2015 05:24
"Vendor=Acme",
"License=GPL",
"Version=1.0"
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a list for a specific reason, or just consistency with Env? Could it be this?

"Labels": {
  "Vendor": "Acme",
  "License": "GPL",
  "Version": "1.0"
},

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently that is to allow multiple keys with the same name (#9882 (comment)).

I wonder (if it should be supported) would be better to store as;

    "Labels": {
      "Vendor": ["Acme"],
      "License": ["GPL"],
      "Version": ["1.0"],
      "foo": ["bar","baz","bam"]
    },

Note; I'm not suggesting that users can directly provide JSON arrays as argument, but only to store multiple --label foo=bar --label foo=baz --label foo=bam

One case that might cause problems here is if one of those labels doesn't have a value (--label foo). Perhaps this should be stored as [NULL,"bar","baz","bam"]? IDK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a more logical data structure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thaJeztah Labels "without values" don't make a lot of sense to me. A label with the empty string as its value, sure, but having a distinct NULL value feels messy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's better; if we're treating label values as strings, then "no" value should be an empty string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lolz. Reading back your comment, you probably suggested to dont add a value to the array at all?

Should probably be fine as well. There's no need to know how many times I set a label, so --label foo will result in {"foo":[]}` (an empty array), right?

A label is a `<key>` / `<value>` pair. Docker stores the values as *strings*.
You can specify multiple labels but each `<key>` / `<value>` must be unique. If
you specify the same `key` multiple times with different values, each subsequent
value overwrites the previous. Docker applies the last `key=value` you supply.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds like it should be written as each <key> will be unique, and have one value

if you say each key=value must be unique, aren't you saying that a key can be in the store more than once?

@SvenDowideit
Copy link
Contributor

yup, minor nits - basically, most of my questions are answered elsewhere, but the reader won't know that at the time.

LGTM - though if you address the nits, it'll Look even better :)

@icecrime
Copy link
Contributor

Thanks all, and thanks Darren for your patience! @moxiegirl will take care of the final adjustments in a separate PR.

icecrime pushed a commit that referenced this pull request Mar 17, 2015
Proposal: One Meta Data to Rule Them All => Labels
@icecrime icecrime merged commit b6ac111 into moby:master Mar 17, 2015
@ibuildthecloud
Copy link
Contributor Author

@icecrime Thank you so much for helping move this along.

@thaJeztah
Copy link
Member

Wow! Happy to see this merged. Thanks @ibuildthecloud for finally making this happen!

@wyaeld
Copy link

wyaeld commented Mar 17, 2015

huge thank-you to all the people who worked on this and the various discussions that led to it, such a useful building block

One question (not to try and increase scope here though). Do people think it makes sense for the distribution components (docker search) to eventually allow for filtering on the labels?

-- `LABEL <key>[=<value>] [<key>[=<value>] ...]`
The **LABEL** instruction adds metadata to an image. A **LABEL** is a
key-value pair. To include spaces within a **LABEL** value, use quotes and
blackslashes as you would in command-line parsing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blackslashes → backslashes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomasTomecek that's "dark matter" I think 😄 Do you want to make a pull-request to fix that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thaJeztah
@icecrime said that @moxiegirl will do final adjustments; I guess this can be included in those

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right! Thanks for spotting it nevertheless :)

@rhatdan
Copy link
Contributor

rhatdan commented Mar 17, 2015

This is huge. Only been waiting on this one for about a year....

@bfirsh
Copy link
Contributor

bfirsh commented Mar 17, 2015

Yaaaay. What an excellent birthday present. Thanks all for your help.

We're going to try and get Compose support in for this in the next release: docker/compose#1066

@dreamcat4
Copy link

@bfirsh Well if you implement Compose support, please consider @thaJeztah's idea earlier on in this thread ^^. So compose can be auto-converting nested data structured into json text blob. So we can have basically annotations-like feature (immutable), and arbitrary or complex data structures saved into labels.

@nicornk
Copy link

nicornk commented Aug 3, 2015

Hi Guys,
is it possible / how is it possible to add a label to an already running container?
Thanks!

@cpuguy83
Copy link
Member

cpuguy83 commented Aug 3, 2015

@nicornk Not possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.