Add a pull secrets manager to cache duplicate secrets #19188

adohe-zz · 2015-12-30T08:14:42Z

FYI. we should add a secret manager to store the pull secrets for different namespaces. Thus for duplicate secrets, we have no need to talk to api server. Actually I am working on this manager, but I have no idea about when to remove a secret.

fgrzadkowski · 2015-12-30T09:58:08Z

Is this supposed to be a manager running in kubelet?

fgrzadkowski · 2015-12-30T09:59:35Z

@kubernetes/goog-node

adohe-zz · 2016-01-04T08:50:50Z

@fgrzadkowski yes, this should be a manager running in kubelet, I am working on this, but I have no idea when to remove the secrets.

vishh · 2016-01-04T18:55:56Z

Can you post a proposal before you start working on a PR?

On Mon, Jan 4, 2016 at 12:51 AM, TonyAdo notifications@github.com wrote:

@fgrzadkowski https://github.com/fgrzadkowski yes, this should be a
manager running in kubelet, I am working on this, but I have no idea when
to remove the secrets.

—
Reply to this email directly or view it on GitHub
#19188 (comment)
.

adohe-zz · 2016-01-05T02:01:27Z

@vishh sure I will.

adohe-zz · 2016-01-05T10:58:35Z

Proposal to create a secret manager

Background

Kubelet will retrieve duplicate secrets multiple times and there is no cache for secrets in the current implementation.
This is somehow hard to address.

Motivation

Add a separate secret manager to avoid duplicate secret retrival and make this easier to address.

Proposed Design

Design of secret manager

SecretManager Interface

type SecretManager interface {
    // Get secret by secret ref name
    GetSecretByName(name string) (*api.Secret, bool)

    // Get secrets by namespace
    GetSecretsByNamespace(namespace string) ([]*api.Secret, bool)

    // Delete all secrets of the namespace
    DeleteSecretsForNamespace(namespace string)

    // Update internal secrets cache
    UpdateSecrets()
}

Implementation

Kubelet get pull secret directly from secret manager, if absent secret manager requests apiserver to get one. But for when to delete and whether we should new a goroutine to update internal secrets cache, I have no good idea.

adohe-zz · 2016-01-05T10:59:17Z

@vishh would you mind reviewing this?

vishh · 2016-01-19T19:48:40Z

The proposal SGTM. Keeping the cache updated in the background will help
reduce the pod startup latency.

On Tue, Jan 5, 2016 at 2:59 AM, TonyAdo notifications@github.com wrote:

@vishh https://github.com/vishh would you mind reviewing this?

—
Reply to this email directly or view it on GitHub
#19188 (comment)
.

adohe-zz · 2016-01-21T04:09:06Z

@vishh sure, I will keep the cache updated in the background.

therc · 2016-01-21T04:46:19Z

It would be great if this worked as the basis for the LOAS daemon from #2209, fetching secrets. The opaque proxying part would happen later, as perhaps moving into a separate process.

yujuhong · 2016-07-13T18:28:17Z

By the way, I just learned that kubelet talks to the apiserver to get the secret again for every pod on every sync.

http://kubernetes.io/docs/user-guide/secrets/
When a secret being already consumed in a volume is updated, projected keys are eventually updated as well. The update time depends on the kubelet syncing period.

This is quite undesirable, especially if pods all use the same secret.

/cc @pmorie

wojtek-t · 2016-11-17T11:00:20Z

This is souce if huuuuuuge load in large cluster. This has to be solved to make cluster scalable to 5k.
I'm bumping the priority of it and moving to 1.6 milestone.

Regarding the proposal, there is a big hole which is how we are updaing secrets in cache.

I think that short term, what we can do is that:

a kubelet has a reflector open for all secrets and all Get() operations are served from the underlying cache - this change is trivial to do
This would be problematic if there are a lot of secrets in the system though.

So how we should solve it is that:

SecretManager has a list of all Secrets that all pods from a given kubelet are using
whenever we create/update/delete a pod, we are updating this list in SecretManager
SecretManager has a watch on all secrets - whenever a watch event for a secret is coming it:
a) if it is interested in a given secret, updates its value in a cache
b) if it's not interested, simply drops it
additionally, we should issue a Get() operation for a secret when it's first registered in SecretManager

This change is also relatively simply - I'm happy to implement it if you agree it makes sense.
@dchen1107 @yujuhong @vishh - any thoughts?

wojtek-t · 2016-11-17T11:12:17Z

@gmarek

wojtek-t · 2016-11-17T11:23:58Z

In the meantime - I've refactored the code to created SecretManager interface - it will easier to implement it with #36984

adohe-zz · 2016-11-17T14:29:44Z

Really glad to see this :)

yujuhong · 2016-11-17T17:33:24Z

@wojtek-t there were concerns about kubelet watching all secrets. See discussion in #32767

gmarek · 2016-11-17T17:48:37Z

I guess we can create a watch per secret, or sth if that will be better. Current logic is unmaintainable from the scalability point of view.

wojtek-t · 2016-11-17T18:17:46Z

@yujuhong - if you have better suggestion, I'm completely fine with it, but we need to change it if we want to scale further (and this is our goal).

Do you think that (at least short term) it's fine to modify my proposal so that:

we are registering in SecretManager in what secrets we are interested in (whenever any pod from a kubelet is added/updated/deleted)
for each of those we are refreshing it every minute or so

This would also address the most common case of each pod just having a default service account.

Does it make sense? If so I'm happy to implement it.

yujuhong · 2016-11-17T18:41:03Z

Don't get me wrong. I agree that this is a scaling/performance bottleneck and stated that clearly in the issue before. I was simply pointing out that there were concerns about letting kubelet watching all secrets :)

One issue with your proposal is that kubelet doesn't know when a secret is updated (since we don't use "watch"). This is fine for existing pods since there is no guarantee on the update delivery latency yet, so we can rely on the 1 min refresh period. However, for new pods being added to kubelet, are we ok with them using the potentially stale (up to 1 min) secrets, or do we need to fetch the latest secret? I think it's worth figuring out whether we can tolerate this, but it's probably going to be a hard sell.

If we absolutely need the latest secret, each new pod will incur a new GET, but will rely on the 1 min refresh after that. This is still better than we have today, but not as great when batch creating pods.

wojtek-t · 2016-11-17T18:48:51Z

If we absolutely need the latest secret, each new pod will incur a new GET, but will rely on the 1 min refresh after that. This is still better than we have today, but not as great when batch creating pods.

I think that for safety, let's assume we need the latest secret, and issue a GET whenever a pod is created (or updated?).
With that modification - are you ok with the proposal?

yujuhong · 2016-11-17T23:09:05Z

With that modification - are you ok with the proposal?

Looks good to me overall.

Demoted to p1 because this issue is not urgent.

liggitt · 2016-11-18T00:27:16Z

@pmorie what is the current lag to update a configmap/secret volume from when it updates in the API?

I want to be really careful not to make the kubelet depend on being able to watch all secrets (or config maps… I assume they have the same issues here). Our eventual goal (in order to reduce the impact of a compromised node) is to limit the kubelet's access to just the secrets mounted into pods scheduled to that node. Implementing watch on a virtual resource based on the intersection of multiple resources is really tough (@deads2k has done it, but it's thorny and usually comes with caveats)

yujuhong · 2016-11-18T00:32:26Z

@liggitt, @wojtek-t's proposal in #19188 (comment) doesn't involve watching secrets IIUC. The secret manager will simply get the secrets for pods assigned to the node every minute. This allows one GET every minute to serve all pods using the same secret.

gmarek · 2016-11-18T06:49:20Z

Quick math for the current state: 150k Pods, each getting at least one secret every 60 s, which means 2.5k QPS on master, which is a lot. If we'll have one get every 60s for every Kubelet it's 80 QPS, but I don't know how to do this kind of get in the system currently in other way that what @wojtek-t proposed, i.e. we issue single get for a secret.

gmarek · 2016-11-18T08:09:02Z

@liggitt - if you're worried about compromised Kubelet I don't think that default Kubelet's behavior is an issue at all. One with an access to Kubelet can just get (or list) any Secret he wants (currently). Once you introduce additional constraints on which secrets Kubelets can get, maybe we can just apply this policy to lists as well (i.e. when Kubelet is listing all Secretes it'll get only ones that are used by Pods that assigned to it)?

wojtek-t · 2016-11-18T08:10:27Z

@liggitt - if a node (kubelet) is compromised, you can send any requests to apiserver anyway, so I don't fully understand this.

Anyway - my current proposal is that we have in kubelet a cache of secrets that only pods from that node are using and we are not listing/watching any other secrets in my current proposal. We are just avoiding issuing gets for the same secret multiple times, which should already be a clear gain. IIUC this solves your concerns.

wojtek-t · 2016-11-18T13:34:57Z

@yujuhong @liggitt - #36984 is the implementation that I would like to submit as soon as code-freeze is finished. If you could take a quick look if the approach is ok for you, I will add unit tests for it sometime next week probably.

deads2k · 2016-11-18T13:35:10Z

@liggitt - if a node (kubelet) is compromised, you can send any requests to apiserver anyway, so I don't fully understand this.

Fixing the authorizer to more tightly protect individual resources is something we're interested in. We shaped node identities to enable us to do it, but its an after RBAC objective. We just don't want to have features in the kubelet that we know will violate eventual rules and watching all secrets runs afoul of that.

wojtek-t · 2016-11-18T13:37:41Z

@deads2k - what I did in #36984 is NOT watching all secrets; in fact it's not watching anything - it just avoids issuing the same gets all the time for multiple pods.

gmarek · 2016-11-18T13:37:45Z

@deads2k - what do you think about Wojtek's proposal of making a single request per Xsec per Secret instance (not Secret reference)

deads2k · 2016-11-18T13:40:32Z

@deads2k - what do you think about Wojtek's proposal of making a single request per Xsec per Secret instance (not Secret reference)

That sounds fine from an API authorization perspective.

wojtek-t · 2016-12-28T15:14:15Z

@liggitt @pmorie - in the initial version of my PR, my purpose was to introduce this SecretManager and integrate it into Kubelet. Now, I would like to extend it to secrets mounted as volumes. But I cannot figure out the path via which secrets from volumes are periodically updated. Can you please point me to the place where secrets are being periodically re-read from apiserver?

The only place I can see that is issuing GET is this one:
https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/secret/secret.go#L196
But I have no idea how this is being called periodically...

Any help is appreciated.

wojtek-t · 2017-01-02T14:48:40Z

OK, I figured it out. So what is happening is that:

in SyncPod() we are calling: volumeManager.WaitForAttachAndMount

kubernetes/pkg/kubelet/kubelet.go

Line 1519 in ab91500

if err := kl.volumeManager.WaitForAttachAndMount(pod); err != nil {
in that we are marking some volumes for remounting:

kubernetes/pkg/kubelet/volumemanager/volume_manager.go

Line 343 in ab91500

vm.actualStateOfWorld.MarkRemountRequired(uniquePodName)
in there we are marking a volume as requiring remount if "volumePlugin" says so:

kubernetes/pkg/kubelet/volumemanager/cache/actual_state_of_world.go

Line 441 in ab91500

if volumePlugin.RequiresRemount() {
secret volume always requires remount:

kubernetes/pkg/volume/secret/secret.go

Line 83 in ab91500

func (plugin *secretPlugin) RequiresRemount() bool {

I think that in order to fix the problem with that, we should introduce some dedicated logic to volume manager reconciler that will be:

checking if the volume is a secret (that's already doable)
if not, everything remains as is
if it is a secret, we will try to remount it not more often than every X minutes, or if the corresponding Secret has been updated in SecretManager since last reconcilation

Once SecretManager interface will be finally merged, that should be relatively small change.

liggitt · 2017-01-03T21:47:49Z

(same goes for configmaps as well)

gmarek · 2017-01-04T21:43:40Z

If I'm not mistaken configmaps are (slightly) less problematic, because by default Pods don't have any (unlike Secrets, in which case Pods have default ServiceAccount token).

wojtek-t · 2017-01-05T09:48:15Z

Yes. We should do the same for config maps, but that's slightly less urgent.

Automatic merge from submit-queue Create SecretManager interface and switch to caching secrets in kubelet Ref #19188 Obviously we would need to extend the interface to solve #19188 but this is good first step anyway.

@gmarek

Automatic merge from submit-queue (batch tested with PRs 40205, 40208) Make secret volume plugin use secret manager Ref #19188 @gmarek

wojtek-t · 2017-01-24T14:25:55Z

@liggitt @yujuhong - we pretty much have the secret manager now implemented. Can we close this issue?

liggitt · 2017-01-24T14:27:27Z

is there a related issue to mirror this for configmaps? really want to keep parity for those objects

wojtek-t · 2017-01-24T14:32:33Z

Good point - I'm not aware of any other issue.

yujuhong · 2017-01-25T19:39:56Z

@wojtek-t, please also update the doc to reflect the changes to the secret updates. Thanks!
https://github.com/kubernetes/kubernetes.github.io/blob/release-1.5/docs/user-guide/secrets/index.md

@liggitt

Automatic merge from submit-queue (batch tested with PRs 40239, 40397, 40449, 40448, 40360) Optimize secret manager to refresh secrets from apiserver cache Ref kubernetes#19188 @liggitt

wojtek-t · 2017-01-27T14:18:10Z

@yujuhong - I've send out a doc update in: kubernetes/website#2372

liggitt · 2017-02-14T03:44:14Z

opened #41379 to track configmaps

fgrzadkowski added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Dec 30, 2015

yujuhong mentioned this issue Jul 8, 2016

Mounting (only 'default-token') volume takes a long time when creating a batch of pods #28616

Closed

yujuhong mentioned this issue Sep 15, 2016

Fix TODO: duplicate secrets are being retrieved multiple times and there is no cache #32767

Closed

yujuhong mentioned this issue Nov 7, 2016

GET requests for secrets are not issued in kubemark #36368

Closed

wojtek-t added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Nov 17, 2016

wojtek-t added this to the v1.6 milestone Nov 17, 2016

wojtek-t mentioned this issue Nov 17, 2016

Create SecretManager interface and switch to caching secrets in kubelet #36984

Merged

wojtek-t mentioned this issue Nov 17, 2016

Make it possible to switch of secrets in Kubemark #36985

Closed

wojtek-t mentioned this issue Jan 20, 2017

Make secret volume plugin use secret manager #40208

Merged

k8s-github-robot pushed a commit that referenced this issue Jan 23, 2017

Merge pull request #40208 from wojtek-t/smart_volume_manager

13424d8

Automatic merge from submit-queue (batch tested with PRs 40205, 40208) Make secret volume plugin use secret manager Ref #19188 @gmarek

wojtek-t mentioned this issue Jan 24, 2017

Optimize secret manager to refresh secrets from apiserver cache #40360

Merged

wojtek-t self-assigned this Jan 24, 2017

liggitt mentioned this issue Feb 14, 2017

Add a configmap manager to cache duplicate configmaps #41379

Closed

liggitt closed this as completed Feb 14, 2017

Add a pull secrets manager to cache duplicate secrets #19188

Add a pull secrets manager to cache duplicate secrets #19188

Comments

adohe-zz commented Dec 30, 2015

fgrzadkowski commented Dec 30, 2015

fgrzadkowski commented Dec 30, 2015

adohe-zz commented Jan 4, 2016

vishh commented Jan 4, 2016

adohe-zz commented Jan 5, 2016

adohe-zz commented Jan 5, 2016

Proposal to create a secret manager

Background

Motivation

Proposed Design

Design of secret manager

Implementation

adohe-zz commented Jan 5, 2016

vishh commented Jan 19, 2016

adohe-zz commented Jan 21, 2016

therc commented Jan 21, 2016

yujuhong commented Jul 13, 2016

wojtek-t commented Nov 17, 2016

wojtek-t commented Nov 17, 2016

wojtek-t commented Nov 17, 2016

adohe-zz commented Nov 17, 2016

yujuhong commented Nov 17, 2016

gmarek commented Nov 17, 2016

wojtek-t commented Nov 17, 2016

yujuhong commented Nov 17, 2016

wojtek-t commented Nov 17, 2016

yujuhong commented Nov 17, 2016

liggitt commented Nov 18, 2016

yujuhong commented Nov 18, 2016

gmarek commented Nov 18, 2016 • edited Loading

gmarek commented Nov 18, 2016 • edited Loading

wojtek-t commented Nov 18, 2016

wojtek-t commented Nov 18, 2016

deads2k commented Nov 18, 2016

wojtek-t commented Nov 18, 2016

gmarek commented Nov 18, 2016

deads2k commented Nov 18, 2016

wojtek-t commented Dec 28, 2016

wojtek-t commented Jan 2, 2017

liggitt commented Jan 3, 2017

gmarek commented Jan 4, 2017

wojtek-t commented Jan 5, 2017

wojtek-t commented Jan 24, 2017

liggitt commented Jan 24, 2017

wojtek-t commented Jan 24, 2017

yujuhong commented Jan 25, 2017

wojtek-t commented Jan 27, 2017

liggitt commented Feb 14, 2017

gmarek commented Nov 18, 2016 •

edited

Loading

gmarek commented Nov 18, 2016 •

edited

Loading