-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: One Meta Data to Rule Them All => Labels #9882
Conversation
Thanks, Darren!
Fully agree on this one. Docker should offer the means to store, search and retrieve the data, but have no opinion on what they are used for, or what (naming)conventions are used. If Docker itself is using meta-data for something, that is just an implementation, just like any other system using the meta-data. The data stored in a label is just free-form text as well; if an implementation decides to use it for storing JSON, that's fine, but Docker doesn't offer special treatment for those values; no parsing, validation or nested search for JSON properties. Indexing / performanceTo be useful (for example, fetch a container via a "custom" id), querying meta-data should be fast. Useful indexes should probably be present, including "partial" matches or wild-card support, both on "keys" and "values". For example, getting all containers that have a labels with namespace/prefix starting with Scope / VisibilityWe should probably ask if meta data is only accessible from "outside" containers (in case of meta-data on containers), or also from within a container; I can see use-cases where meta-data can be useful inside a container. How to control access is something to be discussed (also wrt read/read-write) InheritanceIn case of Image and Container meta-data; should images share the same meta-data as containers running from it? Will they be inherited, but kept separate? Or are they "merged" when creating a container instance? |
I also agree that docker should not dictate how the data gets used or formatted. I personally think docker tries to dictate things a bit too often. We should recommend a best practice, but if someone wants to ignore best practice, they might have a good reason for it. However I don't like the term 'label'. Most systems I've ever dealt with have treated a label as value-only data, not key/value (one example being github labels). Just to clarify what I mean, a label would be something like |
I like this FWIW |
Paging @vieux and @aluzzardi, as this may be very relevant to their interests. I also think it's a really good idea. Chef search is very similar and one of its best features. |
I want to make it clear that my intention is to quickly move this forward. I want to find what we can implement now that will give the minimally viable value but also put us on clear path to adding more functionality. @thaJeztah - Comments below Indexing/PerformanceI completely agree that fast lookup is required. That is one of the fundamental differences between environment variables and labels. For containers I think this can be easily achieved by just keeping the labels in memory in the current data structures. Searching for a label will just be iterating over the list of containers in memory. This means at it’s worse searching for a label will be as slow as Images become more difficult as you have more images than containers, typically. For images a real index should be built. The problem though is that docker networks are coming soon and volumes will probably not be too far. It seems we should find one consistent approach for labels that works for all object types. As such, I would like to defer on search for images by tag. I’ve currently seen a higher demand for fast lookup of containers, but not the same for images. I’m not saying the use cases don’t exist, just that containers are a higher priority. The way in which one can search is largely based on the underly index. So supporting wildcard, regexp, etc. has real technical implications. I think we need a good query syntax, but for the first pass I think it is safe to support “give me all containers that have label foo” and “give me all containers that have label foo equal to bar”. The syntax would be Scope/VisibilityFirst off, scope visibility doesn’t really matter until we have an introspection service. So this is obviously a discussion that will happen elsewhere, but regardless I’ll say what I think it should be. Labels should follow the existing pattern of ports. That being that they are private by default and must be explicitly published. Defining a label is the same as InheritanceI hope you notice a trend in my comments in that we should just follow existing patterns. For inheritance I would expect labels follow the same approach as environment variables. I honestly haven’t given a huge amount of thought to this, but I think the ENV approach should be sufficient. @phemmer I completely agree that label is a bad term. Unfortunately the precedence has already been set with host labels and Kubernetes labels. I have a strong opinion that it's better to be consistently wrong then inconsistently right. I think we should just stick with the ill named “label.” ImplementationI have every intention of pushing this through as fast as possible. I’m going to code the implementation of this hopefully today based off of @rhatdan’s existing work in #8955. I'm optimistic the community can come to a consensus. |
Labels would be awesome for fig/docker compose. Tracking images with tags would allow users to use any name for their images and containers, and would address some of the performance issues with the current version.
It seems to me like inheritance could be entirely client-side. A client which is creating a container from an image should be able to decide which labels to copy over to the container. |
@@ -1,3 +1,5 @@ | |||
Okay, I'm lazy. I'll write docs if I think this PR doesn't get violently opposed | |||
to at first... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
giggle - you'll want to remove this >:}
Do we ever anticipate labels being added/removed on an already existing container? If so, storing them in the config data precludes this possibility. |
I agree with pretty much everything in this proposal - it's basically the one I would've written. As @dnephin says, Compose will be a primary consumer of this feature. |
I can go along with this, just as long as something finally gets merged to allow us to add Meta Data. Even if we can just settle on a name Meta->UserData->Label... |
7bcc076
to
7db7ad1
Compare
"Vendor=Acme", | ||
"License=GPL", | ||
"Version=1.0" | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a list for a specific reason, or just consistency with Env
? Could it be this?
"Labels": {
"Vendor": "Acme",
"License": "GPL",
"Version": "1.0"
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently that is to allow multiple keys with the same name (#9882 (comment)).
I wonder (if it should be supported) would be better to store as;
"Labels": {
"Vendor": ["Acme"],
"License": ["GPL"],
"Version": ["1.0"],
"foo": ["bar","baz","bam"]
},
Note; I'm not suggesting that users can directly provide JSON arrays as argument, but only to store multiple --label foo=bar --label foo=baz --label foo=bam
One case that might cause problems here is if one of those labels doesn't have a value (--label foo
). Perhaps this should be stored as [NULL,"bar","baz","bam"]
? IDK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be a more logical data structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thaJeztah Labels "without values" don't make a lot of sense to me. A label with the empty string as its value, sure, but having a distinct NULL value feels messy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's better; if we're treating label values as strings, then "no" value should be an empty string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lolz. Reading back your comment, you probably suggested to dont add a value to the array at all?
Should probably be fine as well. There's no need to know how many times I set a label, so --label foo
will result in {"foo":[]}` (an empty array), right?
A label is a `<key>` / `<value>` pair. Docker stores the values as *strings*. | ||
You can specify multiple labels but each `<key>` / `<value>` must be unique. If | ||
you specify the same `key` multiple times with different values, each subsequent | ||
value overwrites the previous. Docker applies the last `key=value` you supply. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sounds like it should be written as each <key> will be unique, and have one value
if you say each key=value
must be unique, aren't you saying that a key can be in the store more than once?
yup, minor nits - basically, most of my questions are answered elsewhere, but the reader won't know that at the time. LGTM - though if you address the nits, it'll Look even better :) |
Thanks all, and thanks Darren for your patience! @moxiegirl will take care of the final adjustments in a separate PR. |
Proposal: One Meta Data to Rule Them All => Labels
@icecrime Thank you so much for helping move this along. |
Wow! Happy to see this merged. Thanks @ibuildthecloud for finally making this happen! |
huge thank-you to all the people who worked on this and the various discussions that led to it, such a useful building block One question (not to try and increase scope here though). Do people think it makes sense for the distribution components ( |
-- `LABEL <key>[=<value>] [<key>[=<value>] ...]` | ||
The **LABEL** instruction adds metadata to an image. A **LABEL** is a | ||
key-value pair. To include spaces within a **LABEL** value, use quotes and | ||
blackslashes as you would in command-line parsing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blackslashes → backslashes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomasTomecek that's "dark matter" I think 😄 Do you want to make a pull-request to fix that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thaJeztah
@icecrime said that @moxiegirl will do final adjustments; I guess this can be included in those
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right! Thanks for spotting it nevertheless :)
This is huge. Only been waiting on this one for about a year.... |
Yaaaay. What an excellent birthday present. Thanks all for your help. We're going to try and get Compose support in for this in the next release: docker/compose#1066 |
@bfirsh Well if you implement Compose support, please consider @thaJeztah's idea earlier on in this thread ^^. So compose can be auto-converting nested data structured into |
Hi Guys, |
@nicornk Not possible. |
This is yet another meta data PR in an attempt to pull together multiple PRs hopefully into something we can get into Docker 1.5 (seriously, let's move fast).
Background
Some background... (from what I can gather). There is currently #8955 for adding UserData to a Dockerfile. Basically it adds the
USERDATA key=value key="long value"
syntax to the Docker. Then there is #9013 which looks to add structured JSON data to a Dockerfile and a container in much the same fashion as Kubernete's annotations. Finally there is #9854 which discusses the need for meta data on container to help with Swarm and similar clustering systems. There are probably another 10 threads that @thaJeztah can find that also talk about the need to add meta data to containers.We need this...
We need meta data, it's clear user want it.
Labels
We already have labels on Hosts (Docker daemon) today. It seems that going forward we should be able to add labels to everything: Hosts, containers, volumes, images, etc. Let's just continue with that approach. Labels are simple key/value pairs in the style of
map[string]string
. There have been comments to standarize the format of the keys such that we have namespaces or other means to prevent conflicts. Honestly I don't think that is all that necesary at the Docker level to dictate this. It's just good practice to namespace things. Whether you do "foo_bar" or "foo/bar" or "com.foo/bar", who really cares. If your going to use "id=3", well that's a bad name and somebody may clobber your value. I'm assuming some will disagree with me on this one. That's fine, I'll agree to a namespace standard just as long as it doesn't take three weeks and 132 comments to decide that DNS format is far superior to arbitrary strings split by "/".Labels are not structured data, and as such this PR is different from #9013, so that discussion can happen differently. Honestly, I'm not in favor of adding structured data to objects and having Docker maintain it. But if others disagree, so be it, we can have structured data as something else.
What about ENV vars
Yes, labels are very close to environment variables. You can add ENV to a Dockerfile and you can add them at container create. The basic difference here is that these key/values are not visible to the running processes in the container.
Lookup by Label
Another key attribute of labels is that you should be able to find an object based on it's label. This should initially be kept very simple. You can either say "give me all images/containers that have key foo." Or you can say "give me all images/containers that have foo=bar".
How should we go about doing this?
I think #8955 is the right start.
USERDATA
should be renamed toLABEL
. Now that we haveLABEL
on images we need to add--label key=value
todocker run/create
. This is in the same fashion asENV
and-e
today.Now the only thing left do to is to figure out how to query based on labels. We just need to add
--label foo
or--label foo=bar
todocker images/ps
.It's just that simple folks
Okay, good? Alright, let's move forward...