Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loosen label and annotation validation and related tests #4628

Closed
thockin opened this issue Feb 19, 2015 · 30 comments
Closed

Loosen label and annotation validation and related tests #4628

thockin opened this issue Feb 19, 2015 · 30 comments
Assignees
Labels
area/api Indicates an issue on api area. area/usability priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Milestone

Comments

@thockin
Copy link
Member

thockin commented Feb 19, 2015

From #4486

Proposed:

  • label values: restrict to [A-Za-z0-9_-.]*
  • annotation values: no restriction
  • qualified names (label and annotation keys, resource names, volume plugins): loosen restrictions to ([A-Za-z0-9_-.]+/)?[A-Za-z0-9_-.]+ - this should be a strict superset of
    what we allow today. We can say that convention is to use dns-compatible
    domain + label, but still allow things like FOO/B_A.R. This DOES NOT allow for foo.com/bat/bat - is that an important affordance to anyone? We could allow that, but we need to be clear that "bar/bat" means (bar, bat) while "foo.com/bar/bat" means (foo.com, bar/bat).

@smarterclayton does that give you enough freedom?
@bgrant0607 does that give you enough consistency?
@quinton-hoole-google does this make the qualified name type now viable for your use case in kube-proxy?

@smarterclayton
Copy link
Contributor

Yes on the freedom question. I assume we will spec case-sensitivity?

----- Original Message -----

From #4486

Proposed:

  • label values: restrict to [A-Za-z0-9_-.]*
  • annotation values: no restriction
  • qualified names (label and annotation keys, resource names, volume
    plugins): loosen restrictions to ([A-Za-z0-9_-.]+/)?[A-Za-z0-9_-.]+ - this
    should be a strict superset of
    what we allow today. We can say that convention is to use dns-compatible
    domain + label, but still allow things like FOO/B_A.R. This DOES NOT allow
    for foo.com/bat/bat - is that an important affordance to anyone? We could
    allow that, but we need to be clear that "bar/bat" means (bar, bat) while
    "foo.com/bar/bat" means (foo.com, bar/bat).

@smarterclayton does that give you enough freedom?
@bgrant0607 does that give you enough consistency?
@quinton-hoole-google does this make the qualified name type now viable for
your use case in kube-proxy?


Reply to this email directly or view it on GitHub:
#4628

@thockin
Copy link
Member Author

thockin commented Feb 19, 2015

You mean "yes, it is case sensitive"? Yes, sure.

On Thu, Feb 19, 2015 at 2:45 PM, Clayton Coleman notifications@github.com
wrote:

Yes on the freedom question. I assume we will spec case-sensitivity?

----- Original Message -----

From #4486

Proposed:

  • label values: restrict to [A-Za-z0-9_-.]*
  • annotation values: no restriction
  • qualified names (label and annotation keys, resource names, volume
    plugins): loosen restrictions to ([A-Za-z0-9_-.]+/)?[A-Za-z0-9_-.]+ -
    this
    should be a strict superset of
    what we allow today. We can say that convention is to use dns-compatible
    domain + label, but still allow things like FOO/B_A.R. This DOES NOT
    allow
    for foo.com/bat/bat - is that an important affordance to anyone? We
    could
    allow that, but we need to be clear that "bar/bat" means (bar, bat) while
    "foo.com/bar/bat" means (foo.com, bar/bat).

@smarterclayton does that give you enough freedom?
@bgrant0607 does that give you enough consistency?
@quinton-hoole-google does this make the qualified name type now viable
for
your use case in kube-proxy?


Reply to this email directly or view it on GitHub:
#4628

Reply to this email directly or view it on GitHub
#4628 (comment)
.

@bgrant0607
Copy link
Member

Thanks. Yes, I'm happy with this.

Do we auto-lowercase label keys today?

@bgrant0607 bgrant0607 added area/api Indicates an issue on api area. area/usability sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Feb 19, 2015
@bgrant0607 bgrant0607 added this to the v1.0 milestone Feb 19, 2015
@smarterclayton
Copy link
Contributor

We do not, we fail validation (in master at right now)

----- Original Message -----

Thanks. Yes, I'm happy with this.

Do we auto-lowercase label keys today?


Reply to this email directly or view it on GitHub:
#4628 (comment)

@ghost
Copy link

ghost commented Feb 19, 2015

Yes, I'm reasonably happy with that, from my as yet limited perspective. My personal preference would be to ultimately support both hierarchical names and hierarchical namespaces, somehow (e.g. something like "subdept.dept.acme.com/workgroup/product/user/resource" means (subdept.dept.acme.com, workgroup/product/user/resource), as this makes for convenient ACL's and the like. So I support your "We could allow that..." part, and associated semantics.

@thockin
Copy link
Member Author

thockin commented Feb 19, 2015

I'm not sure we want to zoom in to do ACLs or things on segments of the
name, but it's plausible anyway.

On Thu, Feb 19, 2015 at 3:27 PM, Quinton Hoole notifications@github.com
wrote:

Yes, I'm reasonably happy with that, from my as yet limited perspective.
My personal preference would be to ultimately support both hierarchical
names and hierarchical namespaces, somehow (e.g. something like "
subdept.dept.acme.com/workgroup/product/user/resource" means (
subdept.dept.acme.com, workgroup/product/user/resource), as this makes
for convenient ACL's and the like. So I support your "We could allow
that..." part, and associated semantics.

Reply to this email directly or view it on GitHub
#4628 (comment)
.

@gmarek
Copy link
Contributor

gmarek commented Feb 26, 2015

I can write this change, but there's a couple of things to decide:

  • do we allow comma in label values (we did until now, but in this proposal we don't)? [I guess we do]
  • do we really want to allow Qualified Names to end with dash/period/underscore? [I guess we don't]
  • what max length of Qualified Names we allow? [I guess 253]
  • do we want to allow completely arbitrary annotations, and do we want to enforce some length limit?

@smarterclayton
Copy link
Contributor

On Feb 26, 2015, at 6:56 AM, gmarek notifications@github.com wrote:

I can write this change, but there's a couple of things to decide:

do we allow comma in label values (we did until now, but in this proposal we don't)? [I guess we do]
do we really want to allow Qualified Names to end with dash/period/underscore? [I guess we don't]
Is there a disadvantage?
what max length of Qualified Names we allow? [I guess 253]
It should be defined in the identifiers section, but if not should be predictable to our other names.
do we want to allow completely arbitrary annotations, and do we want to enforce some length limit?
We should set a maximum to limit abuse, but it should probably be total size of all annotation values and keys, not any particular one (I don't see a reason any individual annotation can't be large). 50K total as a straw man.

Reply to this email directly or view it on GitHub.

@thockin
Copy link
Member Author

thockin commented Feb 26, 2015

On Thu, Feb 26, 2015 at 8:24 AM, Clayton Coleman
notifications@github.com wrote:

On Feb 26, 2015, at 6:56 AM, gmarek notifications@github.com wrote:

I can write this change, but there's a couple of things to decide:

do we allow comma in label values (we did until now, but in this proposal we don't)? [I guess we do]
do we really want to allow Qualified Names to end with dash/period/underscore? [I guess we don't]
Is there a disadvantage?

I don't think this is a big deal, but I agree - given that DNS names
don't allow trailing punctuation, it's not unreasonable to assume this
shouldn't. It's a marginally more complex regex is all.

what max length of Qualified Names we allow? [I guess 253]
It should be defined in the identifiers section, but if not should be predictable to our other names.

Previously it was max(dns_label + max(dns_subdomain) + 1. I think
making this be the same as dns subdomain is fine.

do we want to allow completely arbitrary annotations, and do we want to enforce some length limit?
We should set a maximum to limit abuse, but it should probably be total size of all annotation values and keys, not any particular one (I don't see a reason any individual annotation can't be large). 50K total as a straw man.

by 50, Clayton obviously means 64.

@bgrant0607
Copy link
Member

SGTM

@smarterclayton
Copy link
Contributor

Obviously

On Feb 26, 2015, at 11:35 AM, Tim Hockin notifications@github.com wrote:

On Thu, Feb 26, 2015 at 8:24 AM, Clayton Coleman
notifications@github.com wrote:

On Feb 26, 2015, at 6:56 AM, gmarek notifications@github.com wrote:

I can write this change, but there's a couple of things to decide:

do we allow comma in label values (we did until now, but in this proposal we don't)? [I guess we do]
do we really want to allow Qualified Names to end with dash/period/underscore? [I guess we don't]
Is there a disadvantage?

I don't think this is a big deal, but I agree - given that DNS names
don't allow trailing punctuation, it's not unreasonable to assume this
shouldn't. It's a marginally more complex regex is all.

what max length of Qualified Names we allow? [I guess 253]
It should be defined in the identifiers section, but if not should be predictable to our other names.

Previously it was max(dns_label + max(dns_subdomain) + 1. I think
making this be the same as dns subdomain is fine.

do we want to allow completely arbitrary annotations, and do we want to enforce some length limit?
We should set a maximum to limit abuse, but it should probably be total size of all annotation values and keys, not any particular one (I don't see a reason any individual annotation can't be large). 50K total as a straw man.

by 50, Clayton obviously means 64.

Reply to this email directly or view it on GitHub.

@gmarek gmarek self-assigned this Feb 27, 2015
@gmarek
Copy link
Contributor

gmarek commented Feb 27, 2015

There's a small issue with label values: they were not checked until now. E.g. in guestbook example there's a comma used in label value, which is forbidden both in current implementation and Brendans suggestion, but it was OK, as it was never checked. I don't know how many things will brake when we'll turn the value validation on.

In addition current implementation allows backslashes and Brandans suggestion does not - is this a mistake, or is it by design?

@sdminonne
Copy link
Contributor

@gmarek. By design, @smarterclayton suggested to 'preserve' backslash for future uses. I translated it in 'admit it'.

@gmarek
Copy link
Contributor

gmarek commented Feb 27, 2015

I'm sorry, but I don't understand. Is Brendans suggestion by design omit backslash, or originally it was there by design? (I guess you meant the former, but I don't know here the original discussion was)

@sdminonne
Copy link
Contributor

@gmarek as far as I remember during the discussion someone (guessing @smarterclayton) raised the point about unicode and char to be esacped. But I cannot find anymore asked me to reserve backslash for future uses. That's why I modified the regexp and I specifically added an entry in unit test for this.

The starting discussion was here: #341
Then #4486

And by design I mean "by design" we permit the backslash char.

@smarterclayton
Copy link
Contributor

For label queries, reserving backslash for escaping values we might use in the future. For values themselves I don't believe we need any escaping.

On Feb 27, 2015, at 10:47 AM, Salvatore Dario Minonne notifications@github.com wrote:

@gmarek as far as I remember during the discussion someone (guessing @smarterclayton) raised the point about unicode and char to be esacped. But I cannot find anymore asked me to reserve backslash for future uses. That's why I modified the regexp and I specifically added an entry in unit test for this.

The starting discussion was here: #341
Then #4486


Reply to this email directly or view it on GitHub.

@gmarek
Copy link
Contributor

gmarek commented Feb 27, 2015

PR for this #4898

@gmarek
Copy link
Contributor

gmarek commented Feb 27, 2015

Oh, I missed those updates. I'll add a backslash on Monday.

@thockin
Copy link
Member Author

thockin commented Feb 27, 2015

wait wait.

I do not think that adding backslash to the allowed set is what was
suggested. Backslash MAY be needed when parsing strings but as yet it has
not been demonstrated. Let's please try to keep it as minimal as possible
and expand when we have demonstrated reasons.

Let's discuss comma - I don't want it in the set, but is it worth a
breakage risk?

On Fri, Feb 27, 2015 at 8:43 AM, gmarek notifications@github.com wrote:

Oh, I missed those updates. I'll add a backslash on Monday.

Reply to this email directly or view it on GitHub
#4628 (comment)
.

@gmarek
Copy link
Contributor

gmarek commented Feb 27, 2015

Well, comma is only a special case because it's used in Guestbook example (frontend-controller.json). Because this example is the most visible one (at least for us in Warsaw), my feeling is that it was copied a lot of times as a framework, hence making it invalid may be problematic...

Generally adding any nontrivial validation can break something, but I guess we'll need to live with it.

@thockin
Copy link
Member Author

thockin commented Feb 27, 2015

If we have to add comma, it's not the end of the world, but we could also
just fix the example and say "yeah, we broke it". Adding any sort of
restriction is likely to break someone.

I'm open to arguments either way with comma. I'd rather not do it, but
I'll accept it.

On Fri, Feb 27, 2015 at 10:09 AM, gmarek notifications@github.com wrote:

Well, comma is only a special case because it's used in Guestbook example
(frontend-controller.json). Because this example is the most visible one
(at least for us in Warsaw), my feeling is that it was copied a lot of
times as a framework, hence making it invalid may be problematic...

Generally adding any nontrivial validation can break something, but I
guess we'll need to live with it.

Reply to this email directly or view it on GitHub
#4628 (comment)
.

@smarterclayton
Copy link
Contributor

I don't like commas in values of labels. It's a natural separator.

On Feb 27, 2015, at 1:15 PM, Tim Hockin notifications@github.com wrote:

If we have to add comma, it's not the end of the world, but we could also
just fix the example and say "yeah, we broke it". Adding any sort of
restriction is likely to break someone.

I'm open to arguments either way with comma. I'd rather not do it, but
I'll accept it.

On Fri, Feb 27, 2015 at 10:09 AM, gmarek notifications@github.com wrote:

Well, comma is only a special case because it's used in Guestbook example
(frontend-controller.json). Because this example is the most visible one
(at least for us in Warsaw), my feeling is that it was copied a lot of
times as a framework, hence making it invalid may be problematic...

Generally adding any nontrivial validation can break something, but I
guess we'll need to live with it.

Reply to this email directly or view it on GitHub
#4628 (comment)
.


Reply to this email directly or view it on GitHub.

@thockin
Copy link
Member Author

thockin commented Feb 27, 2015

I'm inclined to fix the example and take the hit on this one, while we
can. Once v1beta3 becomes v1, we're locked in.

On Fri, Feb 27, 2015 at 10:25 AM, Clayton Coleman notifications@github.com
wrote:

I don't like commas in values of labels. It's a natural separator.

On Feb 27, 2015, at 1:15 PM, Tim Hockin notifications@github.com
wrote:

If we have to add comma, it's not the end of the world, but we could
also
just fix the example and say "yeah, we broke it". Adding any sort of
restriction is likely to break someone.

I'm open to arguments either way with comma. I'd rather not do it, but
I'll accept it.

On Fri, Feb 27, 2015 at 10:09 AM, gmarek notifications@github.com
wrote:

Well, comma is only a special case because it's used in Guestbook
example
(frontend-controller.json). Because this example is the most visible
one
(at least for us in Warsaw), my feeling is that it was copied a lot of
times as a framework, hence making it invalid may be problematic...

Generally adding any nontrivial validation can break something, but I
guess we'll need to live with it.

Reply to this email directly or view it on GitHub
<
https://github.com/GoogleCloudPlatform/kubernetes/issues/4628#issuecomment-76442562>

.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHub
#4628 (comment)
.

@bgrant0607
Copy link
Member

Comma is out. I don't really want to think about escaping right now.

@bgrant0607
Copy link
Member

BTW, I think "reserving for future use" means that we have to DISALLOW such characters now, lest people start using them in ways that are incompatible with our future plans. We can add flexibility in the future, but it's impossible to take away.

@smarterclayton
Copy link
Contributor

Right

On Feb 27, 2015, at 4:05 PM, Brian Grant notifications@github.com wrote:

BTW, I think "reserving for future use" means that we have to DISALLOW such characters now, lest people start using them in ways that are incompatible with our future plans. We can add flexibility in the future, but it's impossible to take away.


Reply to this email directly or view it on GitHub.

@thockin
Copy link
Member Author

thockin commented Feb 27, 2015

Let's all make a pact to challenge anyone who adds a "string" type to the
API to document the validation rules for it. Or even move away from string
as a literal type and encourage typedefs to capture semantics.

On Fri, Feb 27, 2015 at 2:14 PM, Clayton Coleman notifications@github.com
wrote:

Right

On Feb 27, 2015, at 4:05 PM, Brian Grant notifications@github.com
wrote:

BTW, I think "reserving for future use" means that we have to DISALLOW
such characters now, lest people start using them in ways that are
incompatible with our future plans. We can add flexibility in the future,
but it's impossible to take away.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHub
#4628 (comment)
.

@ghost
Copy link

ghost commented Mar 2, 2015

On Feb 27, 2015 2:18 PM, "Tim Hockin" notifications@github.com wrote:

Let's all make a pact to challenge anyone who adds a "string" type to the
API to document the validation rules for it. Or even move away from string
as a literal type and encourage typedefs to capture semantics.

+100 ! I'd be in favor of systematically getting rid of string as a
parameter type internally too, not just in the API. Strings are
just too prone to misunderstanding and subtle bugs. E.g. does this "name"
string include the namespace prefix or not? The code I've been working on
recently is riddled with these types of latent bugs, both in core code and
in unit and system tests.

Q

On Fri, Feb 27, 2015 at 2:14 PM, Clayton Coleman <notifications@github.com

wrote:

Right

On Feb 27, 2015, at 4:05 PM, Brian Grant notifications@github.com
wrote:

BTW, I think "reserving for future use" means that we have to DISALLOW
such characters now, lest people start using them in ways that are
incompatible with our future plans. We can add flexibility in the
future,
but it's impossible to take away.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHub
<
#4628 (comment)

.


Reply to this email directly or view it on GitHub.

@gmarek
Copy link
Contributor

gmarek commented Mar 6, 2015

As @thockin suggested, we're changing regexp for label values. Tim suggested to have the same rules for values and qualified name pieces (i.e. '.' should not be correct label value), with the difference that we allow empty values. Are everyone OK, with the new regexp:

(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?"

@gmarek
Copy link
Contributor

gmarek commented Mar 13, 2015

I think this is done.

@gmarek gmarek closed this as completed Mar 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/usability priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests

6 participants