Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UpdateApplyAnnotation stores large annotation on every kubectl operation #15878

Closed
liggitt opened this issue Oct 19, 2015 · 26 comments
Closed

UpdateApplyAnnotation stores large annotation on every kubectl operation #15878

liggitt opened this issue Oct 19, 2015 · 26 comments
Assignees
Labels
area/app-lifecycle area/kubectl priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Milestone

Comments

@liggitt
Copy link
Member

liggitt commented Oct 19, 2015

#14626 is adding the full object serialization as an annotation on almost every mutating kubectl operation. This is doubling serialization size unnecessarily, and breaking any objects that were close to the annotation size limit. What benefit is this providing? The apply command should be able to do a 3-way merge between the current state of the API object and the desired state client-side. Not doing so means that kubectl apply isn't useful against objects created by older clients, or objects created or modified by non-kubectl clients.

@liggitt
Copy link
Member Author

liggitt commented Oct 19, 2015

cc from original PR: @bgrant0607, @ghodss, @janetkuo, @mikedanese, @jlowdermilk, @jackgr

@bgrant0607
Copy link
Member

Full discussion was in #1702, with earlier discussion in #1178.

For users of kubectl apply, the annotations are not unnecessary, as they record the previous configuration specified by the user, which otherwise is not recorded anywhere, for use in the 3-way merge, the other 2 inputs being the new configuration and the live API state.

Originally I intended only kubectl apply to set/update these annotations, so that users/clients that didn't use kubectl apply would not be affected. We could potentially revert to that behavior, at the cost of making kubectl apply somewhat less convenient.

When using kubectl apply against older or non-kubectl clients, it's true that the first apply operation doesn't have the benefit of the annotations, so removal of items from the spec cannot be inferred automatically.

const totalAnnotationSizeLimitB int = 64 * (1 << 10) // 64 kB

We could double the total annotation size limit, or exempt these annotations from the limit, or remove the limit, in favor of size limits enforced via LimitRange and ResourceQuota.

In the future we might store this information in separate API resources, but that won't reduce the amount of storage consumed.

@jackgr
Copy link
Contributor

jackgr commented Oct 19, 2015

The apply command can do a 2 way merge between the current state of the API object and desired state client-side. By definition, there are only 2 things to compare in such a scenario, so a 3 way merge is not possible. When the annotation is present, it defines a third version of the data, so then a 3 way merge is possible.

Regarding the benefit, it allows users to modify previous configuration and reapply it. Without the annotation, there's no way to tell what the user has removed since the previous version, and therefore no way to know what can safely be removed from the current state.

Regarding the size increase, can we resolve the problem with the annotation size limit by increasing it?

@liggitt
Copy link
Member Author

liggitt commented Oct 19, 2015

For users of kubectl apply, the annotations are not unnecessary, as they record the previous configuration specified by the user, which otherwise is not recorded anywhere, for use in the 3-way merge, the other 2 inputs being the new configuration and the live API state.

I see. It's unfortunate we don't have access to historic resourceVersions via the API. Putting copies of old versions in as annotations seems like a really messy way to accomplish that.

Originally I intended only kubectl apply to set/update these annotations, so that users/clients that didn't use kubectl apply would not be affected. We could potentially revert to that behavior, at the cost of making kubectl apply somewhat less convenient.

Updating the output of so many commands, and affecting all objects in the system created by someone just using kubectl seems like a lot of impact to support one feature. Can we start by making the annotation opt-in (via an invocation of apply or an option to the mutating commands), or making the historic and desired versions be input to the apply command?

can we resolve the problem with the annotation size limit by increasing it?

That seems like a bandaid. Were performance implications considered?

cc @smarterclayton @deads2k

@bgrant0607
Copy link
Member

@liggitt Access to historic resourceVersions via the API is not sufficient. @jackgr or I should write up a design doc at some point, but you can see a concrete example here: #13007 (comment)

We could make apply opt-in by only creating the annotations in apply, and otherwise only updating them if they already exist.

Re. performance: The impact was not measured, no. I'm not concerned so much about the performance of apply itself. In general, we need to move everything towards writing and reading (#1459) just the parts of resources they need to touch.

Note that pods, which should be the most numerous and most frequently updated resources, should not normally have the annotations applied, since users should be creating pods via controllers.

@bgrant0607
Copy link
Member

cc @wojtek-t re. whether we've observed a performance impact

@smarterclayton
Copy link
Contributor

Since this halves the space available for annotations on every resource, I
think it should be more narrowly scoped. Doubling the annotation size I
think puts us above the etcd 2.0 value limit.

On Oct 19, 2015, at 6:22 PM, Brian Grant notifications@github.com wrote:

@liggitt https://github.com/liggitt Access to historic resourceVersions
via the API is not sufficient. @jackgr https://github.com/jackgr or I
should write up a design doc at some point, but you can see a concrete
example here: #13007 (comment)
#13007 (comment)

We could make apply opt-in by only creating the annotations in apply, and
otherwise only updating them if they already exist.

Re. performance: The impact was not measured, no. I'm not concerned so much
about the performance of apply itself. In general, we need to move
everything towards writing and reading (#1459
#1459) just the parts of
resources they need to touch.

Note that pods, which should be the most numerous and most frequently
updated resources, should not normally have the annotations applied, since
users should be creating pods via controllers.


Reply to this email directly or view it on GitHub
#15878 (comment)
.

@liggitt
Copy link
Member Author

liggitt commented Oct 20, 2015

Less than half, potentially. If you create a replication controller with a large number of containers or volumes, etc, you could potentially hit the size limit with just the apply annotation. Likely, no, but core commands having edge cases like that concerns me.

@bgrant0607
Copy link
Member

@liggitt What scenario are you seeing where resources are close the size limit? 64K seems like a lot.

@bgrant0607
Copy link
Member

What is the etcd value size limit? I see a limit of 1MB for raft messages.

@bgrant0607
Copy link
Member

BTW, annotations were originally added to the system for this precise purpose (#1201), so I'd like to find a way to make them work.

@smarterclayton
Copy link
Contributor

Slightly less than that due to metadata. Are we only at 64k for
annotations? I thought we allowed up to 240k - must be misremembering.

On Mon, Oct 19, 2015 at 9:41 PM, Brian Grant notifications@github.com
wrote:

What is the etcd value size limit? I see a limit of 1MB for raft messages.


Reply to this email directly or view it on GitHub
#15878 (comment)
.

@bgrant0607
Copy link
Member

@smarterclayton See the line of code setting totalAnnotationSizeLimitB that I copy-pasted above.

@smarterclayton
Copy link
Contributor

Fair, but for components that are already doing that it doubles the size of
those objects (OpenShift encodes the current deployment config into an RC,
but we would now need to strip that field). Not a total obstacle, but it
does feel aggressive to carve out that state assuming every CLI operation
would eventually result in an apply. I had shared Jordan's assumption that
only resources created via apply would inherit this state.

On Mon, Oct 19, 2015 at 9:44 PM, Brian Grant notifications@github.com
wrote:

@smarterclayton https://github.com/smarterclayton See the line of code
setting totalAnnotationSizeLimitB that I copy-pasted above.


Reply to this email directly or view it on GitHub
#15878 (comment)
.

@liggitt
Copy link
Member Author

liggitt commented Oct 20, 2015

What scenario are you seeing where resources are close the size limit? 64K seems like a lot.

  • A secret containing several files in its data map (I would assume any sort of config object as well, once that gets figured out)
  • OpenShift's template object (essentially a persisted List + parameter definitions), which also had the distinction of running up against the 64k annotation limit in practice, as template providers wanted to use annotations to store things like icons and descriptions

It's especially painful when this annotation makes the object fail with a "FieldValueTooLong" error on the annotations field when the user didn't actually ask for any annotations.

@bgrant0607 bgrant0607 added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Oct 20, 2015
@bgrant0607 bgrant0607 added this to the v1.1 milestone Oct 20, 2015
@bgrant0607
Copy link
Member

I propose the following for 1.1:

  1. Only apply should create the other annotations. Other commands may update them if they exist, but should not add the annotations if they do not.
  2. Increase the annotation size limit.

For 1.2 or later we should consider:

  1. Elide field values where possible (e.g., large secret data values). Note that values of strategic merge key fields can't be omitted.
  2. Better error messages.
  3. Resource size enforcement (i.e., not just/specifically annotation size).
  4. Aggregate resource size enforcement via ResourceQuota.

@jackgr
Copy link
Contributor

jackgr commented Oct 20, 2015

+1

Jack Greenfield | Google

On Tue, Oct 20, 2015 at 2:11 PM, Brian Grant notifications@github.com
wrote:

I propose the following for 1.1:

  1. Only apply should create the other annotations. Other commands may
    update them if they exist, but should not add the annotations if they do
    not.
  2. Increase the annotation size limit.

For 1.2 or later we should consider:

  1. Elide field values where possible (e.g., large secret data values).
    Note that values of strategic merge key fields can't be omitted.
  2. Better error messages.
  3. Resource size enforcement (i.e., not just/specifically annotation size).
  4. Aggregate resource size enforcement via ResourceQuota.


Reply to this email directly or view it on GitHub
#15878 (comment)
.

@smarterclayton
Copy link
Contributor

Thanks, +1

On Tue, Oct 20, 2015 at 5:12 PM, Jack Greenfield notifications@github.com
wrote:

+1

Jack Greenfield | Google

On Tue, Oct 20, 2015 at 2:11 PM, Brian Grant notifications@github.com
wrote:

I propose the following for 1.1:

  1. Only apply should create the other annotations. Other commands may
    update them if they exist, but should not add the annotations if they do
    not.
  2. Increase the annotation size limit.

For 1.2 or later we should consider:

  1. Elide field values where possible (e.g., large secret data values).
    Note that values of strategic merge key fields can't be omitted.
  2. Better error messages.
  3. Resource size enforcement (i.e., not just/specifically annotation
    size).
  4. Aggregate resource size enforcement via ResourceQuota.


Reply to this email directly or view it on GitHub
<
#15878 (comment)

.


Reply to this email directly or view it on GitHub
#15878 (comment)
.

@bgrant0607
Copy link
Member

Proposed annotation size limit: 256k?

@liggitt
Copy link
Member Author

liggitt commented Oct 20, 2015

256k doesn't seem insane, though with the primary fetch mechanism being lists of all objects of a type, with no field filtering, I really hope most objects don't make use of that headroom

@bgrant0607
Copy link
Member

I agree we're going to need field filtering.

@bgrant0607
Copy link
Member

But not for 1.1.

@bgrant0607
Copy link
Member

Waiting on cherrypick of #16207 @janetkuo

@bgrant0607
Copy link
Member

Done

NicolasT added a commit to scality/metalk8s that referenced this issue Jul 27, 2018
Rationale: `replaced` and `present` use `kubectl apply` to create the
desired objects, which internally sets the old object state as an
annotation. This breaks completely when the object is large, e.g. a
`ConfigMap` containing a large Grafana dashboard.

See: #216
See: kubernetes/kubernetes#15878
cloud-robotics-github-robot pushed a commit to googlecloudrobotics/core that referenced this issue Feb 15, 2019
This change introduces a Go binary that takes a protobuf message
with a few parameters and produces a CRD YAML file for Kubernetes. This
CRD YAML file can be passed to kubectl or put in a Helm chart for later
installation.

We put the proto descriptor into an annotation. The size limit for
annotations is 256k
(kubernetes/kubernetes#15878), and a hello
world descriptor is 9k. We could probably reduce the descriptor size
with some pruning of unused messages.

Change-Id: Ia3c7708989075176ea305f8c81560fa0330c0409
GitOrigin-RevId: 7998b5e
@elvizlai
Copy link

can this be a little larger?

for istio EnvoyFilter grpc descriptor bin file is not large enough.

@dims
Copy link
Member

dims commented Oct 28, 2019

@elvizlai please propose a PR with the change you need and we can debate there. Folks don't usually look at chatter in closed issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/app-lifecycle area/kubectl priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

6 participants