-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a pull secrets manager to cache duplicate secrets #19188
Comments
Is this supposed to be a manager running in kubelet? |
@kubernetes/goog-node |
@fgrzadkowski yes, this should be a manager running in kubelet, I am working on this, but I have no idea when to remove the secrets. |
Can you post a proposal before you start working on a PR? On Mon, Jan 4, 2016 at 12:51 AM, TonyAdo notifications@github.com wrote:
|
@vishh sure I will. |
Proposal to create a secret managerBackgroundKubelet will retrieve duplicate secrets multiple times and there is no cache for secrets in the current implementation. MotivationAdd a separate secret manager to avoid duplicate secret retrival and make this easier to address. Proposed DesignDesign of secret manager
ImplementationKubelet get pull secret directly from secret manager, if absent secret manager requests apiserver to get one. But for when to delete and whether we should new a goroutine to update internal secrets cache, I have no good idea. |
@vishh would you mind reviewing this? |
The proposal SGTM. Keeping the cache updated in the background will help On Tue, Jan 5, 2016 at 2:59 AM, TonyAdo notifications@github.com wrote:
|
@vishh sure, I will keep the cache updated in the background. |
It would be great if this worked as the basis for the LOAS daemon from #2209, fetching secrets. The opaque proxying part would happen later, as perhaps moving into a separate process. |
By the way, I just learned that kubelet talks to the apiserver to get the secret again for every pod on every sync. http://kubernetes.io/docs/user-guide/secrets/ This is quite undesirable, especially if pods all use the same secret. /cc @pmorie |
This is souce if huuuuuuge load in large cluster. This has to be solved to make cluster scalable to 5k. Regarding the proposal, there is a big hole which is how we are updaing secrets in cache. I think that short term, what we can do is that:
So how we should solve it is that:
|
In the meantime - I've refactored the code to created SecretManager interface - it will easier to implement it with #36984 |
Really glad to see this :) |
I guess we can create a watch per secret, or sth if that will be better. Current logic is unmaintainable from the scalability point of view. |
@yujuhong - if you have better suggestion, I'm completely fine with it, but we need to change it if we want to scale further (and this is our goal). Do you think that (at least short term) it's fine to modify my proposal so that:
This would also address the most common case of each pod just having a default service account. Does it make sense? If so I'm happy to implement it. |
Don't get me wrong. I agree that this is a scaling/performance bottleneck and stated that clearly in the issue before. I was simply pointing out that there were concerns about letting kubelet watching all secrets :) One issue with your proposal is that kubelet doesn't know when a secret is updated (since we don't use "watch"). This is fine for existing pods since there is no guarantee on the update delivery latency yet, so we can rely on the 1 min refresh period. However, for new pods being added to kubelet, are we ok with them using the potentially stale (up to 1 min) secrets, or do we need to fetch the latest secret? I think it's worth figuring out whether we can tolerate this, but it's probably going to be a hard sell. If we absolutely need the latest secret, each new pod will incur a new GET, but will rely on the 1 min refresh after that. This is still better than we have today, but not as great when batch creating pods. |
I think that for safety, let's assume we need the latest secret, and issue a GET whenever a pod is created (or updated?). |
Looks good to me overall. Demoted to p1 because this issue is not urgent. |
@pmorie what is the current lag to update a configmap/secret volume from when it updates in the API? I want to be really careful not to make the kubelet depend on being able to watch all secrets (or config maps… I assume they have the same issues here). Our eventual goal (in order to reduce the impact of a compromised node) is to limit the kubelet's access to just the secrets mounted into pods scheduled to that node. Implementing watch on a virtual resource based on the intersection of multiple resources is really tough (@deads2k has done it, but it's thorny and usually comes with caveats) |
@liggitt, @wojtek-t's proposal in #19188 (comment) doesn't involve watching secrets IIUC. The secret manager will simply get the secrets for pods assigned to the node every minute. This allows one GET every minute to serve all pods using the same secret. |
Quick math for the current state: 150k Pods, each getting at least one secret every 60 s, which means 2.5k QPS on master, which is a lot. If we'll have one get every 60s for every Kubelet it's 80 QPS, but I don't know how to do this kind of get in the system currently in other way that what @wojtek-t proposed, i.e. we issue single get for a secret. |
@liggitt - if you're worried about compromised Kubelet I don't think that default Kubelet's behavior is an issue at all. One with an access to Kubelet can just get (or list) any Secret he wants (currently). Once you introduce additional constraints on which secrets Kubelets can get, maybe we can just apply this policy to lists as well (i.e. when Kubelet is listing all Secretes it'll get only ones that are used by Pods that assigned to it)? |
@liggitt - if a node (kubelet) is compromised, you can send any requests to apiserver anyway, so I don't fully understand this. Anyway - my current proposal is that we have in kubelet a cache of secrets that only pods from that node are using and we are not listing/watching any other secrets in my current proposal. We are just avoiding issuing gets for the same secret multiple times, which should already be a clear gain. IIUC this solves your concerns. |
Fixing the authorizer to more tightly protect individual resources is something we're interested in. We shaped node identities to enable us to do it, but its an after RBAC objective. We just don't want to have features in the kubelet that we know will violate eventual rules and watching all secrets runs afoul of that. |
@deads2k - what do you think about Wojtek's proposal of making a single request per Xsec per Secret instance (not Secret reference) |
That sounds fine from an API authorization perspective. |
@liggitt @pmorie - in the initial version of my PR, my purpose was to introduce this SecretManager and integrate it into Kubelet. Now, I would like to extend it to secrets mounted as volumes. But I cannot figure out the path via which secrets from volumes are periodically updated. Can you please point me to the place where secrets are being periodically re-read from apiserver? The only place I can see that is issuing GET is this one: Any help is appreciated. |
OK, I figured it out. So what is happening is that:
I think that in order to fix the problem with that, we should introduce some dedicated logic to volume manager reconciler that will be:
Once SecretManager interface will be finally merged, that should be relatively small change. |
(same goes for configmaps as well) |
If I'm not mistaken configmaps are (slightly) less problematic, because by default Pods don't have any (unlike Secrets, in which case Pods have default ServiceAccount token). |
Yes. We should do the same for config maps, but that's slightly less urgent. |
is there a related issue to mirror this for configmaps? really want to keep parity for those objects |
Good point - I'm not aware of any other issue. |
@wojtek-t, please also update the doc to reflect the changes to the secret updates. Thanks! |
Automatic merge from submit-queue (batch tested with PRs 40239, 40397, 40449, 40448, 40360) Optimize secret manager to refresh secrets from apiserver cache Ref kubernetes#19188 @liggitt
@yujuhong - I've send out a doc update in: kubernetes/website#2372 |
opened #41379 to track configmaps |
FYI. we should add a secret manager to store the pull secrets for different namespaces. Thus for duplicate secrets, we have no need to talk to api server. Actually I am working on this manager, but I have no idea about when to remove a secret.
The text was updated successfully, but these errors were encountered: