Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key/Cert rotation for Kubernetes clients. #4672

Closed
erictune opened this issue Feb 20, 2015 · 31 comments · Fixed by #79083
Closed

Key/Cert rotation for Kubernetes clients. #4672

erictune opened this issue Feb 20, 2015 · 31 comments · Fixed by #79083
Assignees
Labels
area/client-libraries area/security kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/auth Categorizes an issue or PR as relevant to SIG Auth.

Comments

@erictune
Copy link
Member

We may want to rotate keys/certs periodically for things that use pkg/client.

  • we might rotate the master's certs when they expire. don't want clients to fail during that transitition
  • Pod using automatically generated credentials in a secrets volume.
  • kubelet/kube-proxy using a per-node token or x509 keypair

Suggest we assume that whatever key rotation plan is used (out of scope of this bug), that old and new tokens/keypairs will always have overlapping lifetimes. If not, we'd need to teach .kubeconfig to have a list of creds and teach pkg/client to retry. (I prefer the former option).

Suggest pkg/client watches the .kubeconfig file and any file mentioned by it, using inotify.

@deads2k @liggitt @roberthbailey

@vmarmol vmarmol added area/security area/client-libraries priority/backlog Higher priority than priority/awaiting-more-evidence. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Feb 20, 2015
@liggitt
Copy link
Member

liggitt commented Feb 20, 2015

the .kubeconfig loader -> client.Config -> client.Client transformation steps are currently lossy... they transform and return an object disconnected from the config source

@erictune
Copy link
Member Author

Ugh.

@alex-mohr alex-mohr added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Mar 19, 2015
@roberthbailey roberthbailey added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Sep 23, 2015
@roboll
Copy link
Contributor

roboll commented Nov 24, 2015

Has there been any traction on this? I'd be willing to take a look if no one is / will be in the next few days.

@roberthbailey
Copy link
Contributor

Given the upcoming (US) holiday, I doubt anyone is going to be working on this in the next few days. Go for it!

@liggitt
Copy link
Member

liggitt commented Nov 24, 2015

It might be helpful to discuss the approach you plan to take before a lot of code gets written... how kubeconfigs will remain connected to resulting client configs, how clients built from configs will be rebuilt (or have transports swapped) on update, etc.

@roboll
Copy link
Contributor

roboll commented Nov 24, 2015

@liggitt sure, going to take a look today and come up with a proposal.

@roboll
Copy link
Contributor

roboll commented Nov 24, 2015

OK, did a little digging, want to circle back for input on a few things.

platform compatibility

The best approach for this is probably to use https://github.com/go-fsnotify/fsnotify, which is limited by compatibility listed here https://github.com/go-fsnotify/fsnotify#file-system-notifications-for-go. As far as I can tell the limitation won't affect kubelet, kube-apiserver, kube-scheduler, or kube-controller-manager which all run on linux only anyway. kubectl and other code consuming the client api may be affected, however, so we can gate this with a config option and change the default on the aforementioned components to request the watch.

runtime replacement on change

The relationship between the resulting client and the kubeconfig is completely severed when unversioned.New(*Config) (ref) copies the config argument and passes it to sub-clients individually. This makes it difficult to watch for updates on the config and propagate them.

Instead, if we compromise to only watch the files mentioned in the kubeconfig (i.e. changes to the certificates mention by path in kubeconfig, as opposed to watching kubeconfig for changes), we may be able to integrate the watch with the client itself in RESTClientFor, which is at the base of all of the sub-clients of Client. For example, here, we could dispatch a goroutine that retains a pointer to the client and atomically updates its transport when changes to the files are detected.

Otherwise, to address changes in kubeconfig, i.e. embedded certficate/key data, tokens, urls, etc., there is a bit more design work to figure out how to either carry a reference to the original config source within the clients, or somehow maintain references through the copy operations.

@liggitt @roberthbailey what do you think?

@erictune
Copy link
Member Author

@deads2k who wrote the client builders.

@erictune
Copy link
Member Author

There are two main cases:

  1. kubectl being used on your laptop:
    • token, certs, and/or password are typically stored inline in .kube/config. So, you can't use the "watch just the files" method you suggest.
    • no automatic mechanism for feeding new creds/certs to your laptop anyways.
    • suggested approach is require user to fix and re-run kubectl.
    • since kubectl on your laptop is typically used interactively, and for short durations, having to re-run it is not so bad.
  2. go client running in a pod:
    • service account token is stored in its own file, so you can inotify watch it.
    • not sure how certs work currently, that needs to be checked.
    • since pods are typically long running, having to restart it is potentially more annoying.
    • however, secrets are currently specified as immutable after Pod creation, so we would need to change that.

@erictune
Copy link
Member Author

suggest attacking number 2 first.

@deads2k
Copy link
Contributor

deads2k commented Nov 24, 2015

Thinking through the downstream impacts, I'd like to see this developed with a mechanism that calls a registered function to decide how to handle expired credentials. This would allow us to do things like:

  1. use a service account to run a controller
  2. discover SA token was compromised
  3. delete compromised SA token
  4. wait for a new token to be created for the SA
  5. update the client/transport to use the new SA token

That means I have cases where I want to react to certain kinds of rejections and where I want to proactively push new credentials.

@roboll
Copy link
Contributor

roboll commented Nov 24, 2015

@erictune generally agree with your points in 1), except for the usage of kubectl used as an authenticated port forwarding proxy within a pod (I've seen this recommended in the kube docs somewhere, but can't seem to find it now.) In that case it would be ideal if it used the same mechanism the other consumers of the client api use, and since it's using the same client code, that should be simple and can be enabled with a flag maybe.

as for 2), credentials for service accounts are stored in their own files, but kubeconfig also allows them to be embedded within the yaml file itself. This is the case for tokens, certs/keys, and ca certs. I believe that the config should be immutable, and in order to have changing credentials they should be referenced from the kubeconfig file, not embedded within it. This allows handling the changes further down hierarchy and not having to maintain a reference to configuration loading code within the client itself.

Consider the use case of non-service account credentials, for example, cert/key mounted from hostPath into a manifest pod for authenticated kube master components scheduler and controller-manager, or certificate/key on host disk used by kubelet for master authentication. imho its an unlikely scenario where a certificate reference (a path in kubeconfig) needs to be updated, but more likely that a sidecar container or a host level daemon may replace the contents of the file with a new certificate. We can pick that up with inotify, and it can be integrated with the client itself and disjoint from the kubeconfig code. The same mechanism would work when/if it becomes possible to update secret volumes as well.

@deads2k Handling rejection is probably much more complicated, i.e. for certs it implies knowledge of / an interface with a certificate authority or signing service. I like the idea of a function that re-reads the config, and thats generally what I was trying to say in my second proposal above, but that implies that a kubeconfig should be allowed to change over time while a server remains running.

I guess the question is, should a config be immutable throughout the life of a server, or should it be allowed to be modified? My stance is that a config should be immutable, but the references it makes to external files (certificates, keys, tokens, etc.) should be kept up to date with changes to those files.

@liggitt
Copy link
Member

liggitt commented Nov 24, 2015

My stance is that a config should be immutable, but the references it makes to external files (certificates, keys, tokens, etc.) should be kept up to date with changes to those files.

I disagree... I wouldn't want to special case key/cert rotation... what if they are using token auth (which is specified in the kubeconfig)?

@erictune
Copy link
Member Author

@ROBBOLL

Check out InClusterConfig and NewInCluster in
pkg/client/unversioned/helper.go

When this factory is used, then there is no .kube/config file used at all.

On Tue, Nov 24, 2015 at 1:04 PM, Jordan Liggitt notifications@github.com
wrote:

My stance is that a config should be immutable, but the references it
makes to external files (certificates, keys, tokens, etc.) should be kept
up to date with changes to those files.

I disagree... I wouldn't want to special case key/cert rotation... what if
they are using token auth (which is specified in the kubeconfig)?


Reply to this email directly or view it on GitHub
#4672 (comment)
.

@erictune
Copy link
Member Author

And func (config DeferredLoadingClientConfig) ClientConfig() tries to use
an inClusterConfig.
I think kubectl calls that. Which means that kubect as a proxy in a pod
does not use .kube/config but instead
uses the token at /var/run/secrets/kubernetes.io/serviceaccount/token.
Not sure what certs are used though.

On Tue, Nov 24, 2015 at 1:09 PM, Eric Tune etune@google.com wrote:

@ROBBOLL

Check out InClusterConfig and NewInCluster in
pkg/client/unversioned/helper.go

When this factory is used, then there is no .kube/config file used at
all.

On Tue, Nov 24, 2015 at 1:04 PM, Jordan Liggitt notifications@github.com
wrote:

My stance is that a config should be immutable, but the references it
makes to external files (certificates, keys, tokens, etc.) should be kept
up to date with changes to those files.

I disagree... I wouldn't want to special case key/cert rotation... what
if they are using token auth (which is specified in the kubeconfig)?


Reply to this email directly or view it on GitHub
#4672 (comment)
.

@roboll
Copy link
Contributor

roboll commented Nov 24, 2015

@erictune I see that, yeah. This case is outside both the case of monitoring a kubeconfig or the files mentioned within it, and it's also completely disjoint from the resulting client - all it ends up with is a string extracted from the file. Without integrating the config source into the client it won't be possible to monitor that file for changes.

Also, I'm pretty certain tokens and certs are mutually exclusive, in this case it doesn't use certs. kubectl --config=mykubeconfig would, but based on this that's not relevant within the cluster.

@liggitt then maybe integrating the config loading and the refresh into the client is a way to get this a bit cleaner. (src)

type Client struct {
    *RESTClient
    *ExtensionsClient
    // TODO: remove this when we re-structure pkg/client.
    *DiscoveryClient

    ConfigSource func() *Config
}

type Reloadable interface {
    Watch() error
    Reload() error
}

func (c *Client) Watch() error {
    //fsnotify loop
    if err := c.Reload(); err != nil {
        return err
    }
}

func (c *Client) Reload() error {
    new := c.ConfigSource()
    //todo update necessary components
}

c := unversioned.New(func() *Config {
    //load configuration here
})
go c.Watch()

Something like this could extend to reactive credentials updates if the Reload func were exposed publicly, and it should be able to cover the case of in cluster config, kubeconfig file changes, and files referenced by kubeconfig.

@deads2k does this align with what you were thinking?

@liggitt
Copy link
Member

liggitt commented Nov 24, 2015

Also, I'm pretty certain tokens and certs are mutually exclusive

Nit, but tokens are not mutually exclusive with certs. Certs are TLS-level, tokens (or basic auth) is header-level. Some providers require both.

@deads2k
Copy link
Contributor

deads2k commented Nov 25, 2015

@deads2k does this align with what you were thinking?

Structurally, I don't think I'd want the Config writeable for everyone with handle to Client. I could probably be convinced of a RefreshTransport method and I don't have an issue with constructing a client with a ConfigFn func() *Config.

@lavalamp is currently in the client code. He may have a stronger opinion.

I do still want the ability to register some sort of error handler for specific HTTP status codes in the RESTClient. I think a simple error handler, plus this push mechanism would be sufficient to satisfy my need to handle expired credentials automatically.

@roberthbailey roberthbailey removed this from the v1.7 milestone May 30, 2017
@roberthbailey roberthbailey removed the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Aug 29, 2017
@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Removed

@erictune @jcbsmpsn @liggitt

Important:
This issue was missing labels required for the v1.8 milestone for more than 7 days:

kind: Must specify exactly one of [kind/bug, kind/cleanup, kind/feature].
priority: Must specify exactly one of [priority/critical-urgent, priority/important-longterm, priority/important-soon].

Removing it from the milestone.

Additional instructions available here The commands available for adding these labels are documented here

@jamiehannaford
Copy link
Contributor

Now that kubelet supports auto-rotation of certs using the Certificate Request Signing API (see #45059), can other clients do the same thing?

One potential issue is how CSR will be approved for non-node clients. After #45619, this criteria needs to be matched for a CSR to be auto-approved:

requests from a node for a TLS client certificate for itself are approved if the CSR creator has create permission on the certificatesigningrequests resource and the selfnodeclient subresource in the certificates.k8s.io API group

Hopefully components can add this rolebinding when they're first installed on the system.

@mikedanese mikedanese assigned mikedanese and unassigned jcbsmpsn Dec 4, 2017
@liggitt liggitt added kind/feature Categorizes issue or PR as related to a new feature. and removed milestone/removed labels Jan 6, 2018
@jpiper
Copy link

jpiper commented Jan 12, 2018

Just hit this on a cluster I set up a year ago with certs that expired for the service account token validation.

Really frustrating to fix as you have to have to delete all the service account tokens from secrets to get them to refresh, and you then have to restart any pods that need any of those secrets.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 12, 2018
@mikedanese mikedanese removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 12, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 11, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 9, 2018
@george-angel
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 9, 2018
@mikedanese mikedanese added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Oct 9, 2018
@yastij
Copy link
Member

yastij commented Dec 27, 2018

@mikedanese @liggitt - seems that the one way to go with this is:

  • if clients (kube-scheduler, controller-manager etc..) sets a --rotate-certificates and --boostrap-kubeconfig basically the client adopts the same behavior as kubelet client cert rotation.

  • for cert/key pair that sign/validates serviceaccount tokens we can rely on the same workflow.

As for the logic, we might want to factor it under client-go, so every client can use cert rotation workflow.

This has the benefit of having the same behavior as what is existing now for the kubelet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/client-libraries area/security kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/auth Categorizes an issue or PR as relevant to SIG Auth.
Projects
None yet
Development

Successfully merging a pull request may close this issue.