Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a pull secrets manager to cache duplicate secrets #19188

Closed
adohe-zz opened this issue Dec 30, 2015 · 43 comments
Closed

Add a pull secrets manager to cache duplicate secrets #19188

adohe-zz opened this issue Dec 30, 2015 · 43 comments
Assignees
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@adohe-zz
Copy link

FYI. we should add a secret manager to store the pull secrets for different namespaces. Thus for duplicate secrets, we have no need to talk to api server. Actually I am working on this manager, but I have no idea about when to remove a secret.

@fgrzadkowski
Copy link
Contributor

Is this supposed to be a manager running in kubelet?

@fgrzadkowski fgrzadkowski added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Dec 30, 2015
@fgrzadkowski
Copy link
Contributor

@kubernetes/goog-node

@adohe-zz
Copy link
Author

adohe-zz commented Jan 4, 2016

@fgrzadkowski yes, this should be a manager running in kubelet, I am working on this, but I have no idea when to remove the secrets.

@vishh
Copy link
Contributor

vishh commented Jan 4, 2016

Can you post a proposal before you start working on a PR?

On Mon, Jan 4, 2016 at 12:51 AM, TonyAdo notifications@github.com wrote:

@fgrzadkowski https://github.com/fgrzadkowski yes, this should be a
manager running in kubelet, I am working on this, but I have no idea when
to remove the secrets.


Reply to this email directly or view it on GitHub
#19188 (comment)
.

@adohe-zz
Copy link
Author

adohe-zz commented Jan 5, 2016

@vishh sure I will.

@adohe-zz
Copy link
Author

adohe-zz commented Jan 5, 2016

Proposal to create a secret manager

Background

Kubelet will retrieve duplicate secrets multiple times and there is no cache for secrets in the current implementation.
This is somehow hard to address.

Motivation

Add a separate secret manager to avoid duplicate secret retrival and make this easier to address.

Proposed Design

Design of secret manager

SecretManager Interface

type SecretManager interface {
    // Get secret by secret ref name
    GetSecretByName(name string) (*api.Secret, bool)

    // Get secrets by namespace
    GetSecretsByNamespace(namespace string) ([]*api.Secret, bool)

    // Delete all secrets of the namespace
    DeleteSecretsForNamespace(namespace string)

    // Update internal secrets cache
    UpdateSecrets()
}

Implementation

Kubelet get pull secret directly from secret manager, if absent secret manager requests apiserver to get one. But for when to delete and whether we should new a goroutine to update internal secrets cache, I have no good idea.

@adohe-zz
Copy link
Author

adohe-zz commented Jan 5, 2016

@vishh would you mind reviewing this?

@vishh
Copy link
Contributor

vishh commented Jan 19, 2016

The proposal SGTM. Keeping the cache updated in the background will help
reduce the pod startup latency.

On Tue, Jan 5, 2016 at 2:59 AM, TonyAdo notifications@github.com wrote:

@vishh https://github.com/vishh would you mind reviewing this?


Reply to this email directly or view it on GitHub
#19188 (comment)
.

@adohe-zz
Copy link
Author

@vishh sure, I will keep the cache updated in the background.

@therc
Copy link
Member

therc commented Jan 21, 2016

It would be great if this worked as the basis for the LOAS daemon from #2209, fetching secrets. The opaque proxying part would happen later, as perhaps moving into a separate process.

@yujuhong
Copy link
Contributor

By the way, I just learned that kubelet talks to the apiserver to get the secret again for every pod on every sync.

http://kubernetes.io/docs/user-guide/secrets/
When a secret being already consumed in a volume is updated, projected keys are eventually updated as well. The update time depends on the kubelet syncing period.

This is quite undesirable, especially if pods all use the same secret.

/cc @pmorie

@wojtek-t
Copy link
Member

This is souce if huuuuuuge load in large cluster. This has to be solved to make cluster scalable to 5k.
I'm bumping the priority of it and moving to 1.6 milestone.

Regarding the proposal, there is a big hole which is how we are updaing secrets in cache.

I think that short term, what we can do is that:

  • a kubelet has a reflector open for all secrets and all Get() operations are served from the underlying cache - this change is trivial to do
    This would be problematic if there are a lot of secrets in the system though.

So how we should solve it is that:

  • SecretManager has a list of all Secrets that all pods from a given kubelet are using

  • whenever we create/update/delete a pod, we are updating this list in SecretManager

  • SecretManager has a watch on all secrets - whenever a watch event for a secret is coming it:
    a) if it is interested in a given secret, updates its value in a cache
    b) if it's not interested, simply drops it

  • additionally, we should issue a Get() operation for a secret when it's first registered in SecretManager

    This change is also relatively simply - I'm happy to implement it if you agree it makes sense.
    @dchen1107 @yujuhong @vishh - any thoughts?

@wojtek-t wojtek-t added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Nov 17, 2016
@wojtek-t wojtek-t added this to the v1.6 milestone Nov 17, 2016
@wojtek-t
Copy link
Member

@gmarek

@wojtek-t
Copy link
Member

In the meantime - I've refactored the code to created SecretManager interface - it will easier to implement it with #36984

@adohe-zz
Copy link
Author

Really glad to see this :)

@yujuhong
Copy link
Contributor

@wojtek-t there were concerns about kubelet watching all secrets. See discussion in #32767

@gmarek
Copy link
Contributor

gmarek commented Nov 17, 2016

I guess we can create a watch per secret, or sth if that will be better. Current logic is unmaintainable from the scalability point of view.

@wojtek-t
Copy link
Member

@yujuhong - if you have better suggestion, I'm completely fine with it, but we need to change it if we want to scale further (and this is our goal).

Do you think that (at least short term) it's fine to modify my proposal so that:

  • we are registering in SecretManager in what secrets we are interested in (whenever any pod from a kubelet is added/updated/deleted)
  • for each of those we are refreshing it every minute or so

This would also address the most common case of each pod just having a default service account.

Does it make sense? If so I'm happy to implement it.

@yujuhong
Copy link
Contributor

Don't get me wrong. I agree that this is a scaling/performance bottleneck and stated that clearly in the issue before. I was simply pointing out that there were concerns about letting kubelet watching all secrets :)

One issue with your proposal is that kubelet doesn't know when a secret is updated (since we don't use "watch"). This is fine for existing pods since there is no guarantee on the update delivery latency yet, so we can rely on the 1 min refresh period. However, for new pods being added to kubelet, are we ok with them using the potentially stale (up to 1 min) secrets, or do we need to fetch the latest secret? I think it's worth figuring out whether we can tolerate this, but it's probably going to be a hard sell.

If we absolutely need the latest secret, each new pod will incur a new GET, but will rely on the 1 min refresh after that. This is still better than we have today, but not as great when batch creating pods.

@wojtek-t
Copy link
Member

If we absolutely need the latest secret, each new pod will incur a new GET, but will rely on the 1 min refresh after that. This is still better than we have today, but not as great when batch creating pods.

I think that for safety, let's assume we need the latest secret, and issue a GET whenever a pod is created (or updated?).
With that modification - are you ok with the proposal?

@yujuhong
Copy link
Contributor

With that modification - are you ok with the proposal?

Looks good to me overall.

Demoted to p1 because this issue is not urgent.

@liggitt
Copy link
Member

liggitt commented Nov 18, 2016

@pmorie what is the current lag to update a configmap/secret volume from when it updates in the API?

I want to be really careful not to make the kubelet depend on being able to watch all secrets (or config maps… I assume they have the same issues here). Our eventual goal (in order to reduce the impact of a compromised node) is to limit the kubelet's access to just the secrets mounted into pods scheduled to that node. Implementing watch on a virtual resource based on the intersection of multiple resources is really tough (@deads2k has done it, but it's thorny and usually comes with caveats)

@yujuhong
Copy link
Contributor

@liggitt, @wojtek-t's proposal in #19188 (comment) doesn't involve watching secrets IIUC. The secret manager will simply get the secrets for pods assigned to the node every minute. This allows one GET every minute to serve all pods using the same secret.

@gmarek
Copy link
Contributor

gmarek commented Nov 18, 2016

Quick math for the current state: 150k Pods, each getting at least one secret every 60 s, which means 2.5k QPS on master, which is a lot. If we'll have one get every 60s for every Kubelet it's 80 QPS, but I don't know how to do this kind of get in the system currently in other way that what @wojtek-t proposed, i.e. we issue single get for a secret.

@gmarek
Copy link
Contributor

gmarek commented Nov 18, 2016

@liggitt - if you're worried about compromised Kubelet I don't think that default Kubelet's behavior is an issue at all. One with an access to Kubelet can just get (or list) any Secret he wants (currently). Once you introduce additional constraints on which secrets Kubelets can get, maybe we can just apply this policy to lists as well (i.e. when Kubelet is listing all Secretes it'll get only ones that are used by Pods that assigned to it)?

@wojtek-t
Copy link
Member

@liggitt - if a node (kubelet) is compromised, you can send any requests to apiserver anyway, so I don't fully understand this.

Anyway - my current proposal is that we have in kubelet a cache of secrets that only pods from that node are using and we are not listing/watching any other secrets in my current proposal. We are just avoiding issuing gets for the same secret multiple times, which should already be a clear gain. IIUC this solves your concerns.

@wojtek-t
Copy link
Member

@yujuhong @liggitt - #36984 is the implementation that I would like to submit as soon as code-freeze is finished. If you could take a quick look if the approach is ok for you, I will add unit tests for it sometime next week probably.

@deads2k
Copy link
Contributor

deads2k commented Nov 18, 2016

@liggitt - if a node (kubelet) is compromised, you can send any requests to apiserver anyway, so I don't fully understand this.

Fixing the authorizer to more tightly protect individual resources is something we're interested in. We shaped node identities to enable us to do it, but its an after RBAC objective. We just don't want to have features in the kubelet that we know will violate eventual rules and watching all secrets runs afoul of that.

@wojtek-t
Copy link
Member

@deads2k - what I did in #36984 is NOT watching all secrets; in fact it's not watching anything - it just avoids issuing the same gets all the time for multiple pods.

@gmarek
Copy link
Contributor

gmarek commented Nov 18, 2016

@deads2k - what do you think about Wojtek's proposal of making a single request per Xsec per Secret instance (not Secret reference)

@deads2k
Copy link
Contributor

deads2k commented Nov 18, 2016

@deads2k - what do you think about Wojtek's proposal of making a single request per Xsec per Secret instance (not Secret reference)

That sounds fine from an API authorization perspective.

@wojtek-t
Copy link
Member

@liggitt @pmorie - in the initial version of my PR, my purpose was to introduce this SecretManager and integrate it into Kubelet. Now, I would like to extend it to secrets mounted as volumes. But I cannot figure out the path via which secrets from volumes are periodically updated. Can you please point me to the place where secrets are being periodically re-read from apiserver?

The only place I can see that is issuing GET is this one:
https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/secret/secret.go#L196
But I have no idea how this is being called periodically...

Any help is appreciated.

@wojtek-t
Copy link
Member

wojtek-t commented Jan 2, 2017

OK, I figured it out. So what is happening is that:

I think that in order to fix the problem with that, we should introduce some dedicated logic to volume manager reconciler that will be:

  • checking if the volume is a secret (that's already doable)
  • if not, everything remains as is
  • if it is a secret, we will try to remount it not more often than every X minutes, or if the corresponding Secret has been updated in SecretManager since last reconcilation

Once SecretManager interface will be finally merged, that should be relatively small change.

@liggitt
Copy link
Member

liggitt commented Jan 3, 2017

(same goes for configmaps as well)

@gmarek
Copy link
Contributor

gmarek commented Jan 4, 2017

If I'm not mistaken configmaps are (slightly) less problematic, because by default Pods don't have any (unlike Secrets, in which case Pods have default ServiceAccount token).

@wojtek-t
Copy link
Member

wojtek-t commented Jan 5, 2017

Yes. We should do the same for config maps, but that's slightly less urgent.

k8s-github-robot pushed a commit that referenced this issue Jan 20, 2017
Automatic merge from submit-queue

Create SecretManager interface and switch to caching secrets in kubelet

Ref #19188

Obviously we would need to extend the interface to solve #19188 but this is good first step anyway.
k8s-github-robot pushed a commit that referenced this issue Jan 23, 2017
Automatic merge from submit-queue (batch tested with PRs 40205, 40208)

Make secret volume plugin use secret manager

Ref #19188

@gmarek
@wojtek-t wojtek-t self-assigned this Jan 24, 2017
@wojtek-t
Copy link
Member

@liggitt @yujuhong - we pretty much have the secret manager now implemented. Can we close this issue?

@liggitt
Copy link
Member

liggitt commented Jan 24, 2017

is there a related issue to mirror this for configmaps? really want to keep parity for those objects

@wojtek-t
Copy link
Member

Good point - I'm not aware of any other issue.

@yujuhong
Copy link
Contributor

@wojtek-t, please also update the doc to reflect the changes to the secret updates. Thanks!
https://github.com/kubernetes/kubernetes.github.io/blob/release-1.5/docs/user-guide/secrets/index.md

spxtr pushed a commit to spxtr/kubernetes that referenced this issue Jan 26, 2017
Automatic merge from submit-queue (batch tested with PRs 40239, 40397, 40449, 40448, 40360)

Optimize secret manager to refresh secrets from apiserver cache

Ref kubernetes#19188

@liggitt
@wojtek-t
Copy link
Member

@yujuhong - I've send out a doc update in: kubernetes/website#2372

@liggitt
Copy link
Member

liggitt commented Feb 14, 2017

opened #41379 to track configmaps

@liggitt liggitt closed this as completed Feb 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

9 participants