Control plane bootstrapping order AKA we need a run-level conceptΒ #54522
Description
Background:
We are adding extension mechanisms to the Kubernetes control plane, initializers and admission webhooks. If e.g. the webhooks are configured but not actually running in the cluster, then the cluster is broken until an administrator can fix it. To make it possible to avoid this situation, we're going to let the webhook be gated on a selector matching the labels on the namespace containing the item under consideration. This should make it possible to construct a set of labels on namespaces that will allow the namespaces hosting the critical webhooks to be operational when the webhooks aren't running. (I will add a link to the design when it is published.)
What we need:
We're looking for documented best practices around this. We imagined building a "run level" system in labels on namespaces out of this. A complete solution should
- Cover how many run levels there are
- Cover what components go in which run level
- Analyze the functionality of the current controller-manager; it may need to be split into binaries or modes that are in different run levels
- Draw some inspiration from Brian's layers doc.
We think cluster lifecycle SIG is probably the best place for this to be worked out.
(This is from a meeting between myself, @cheftako, @deads2k, @smarterclayton, @liggitt, @caesarxuchao, and @jagosan. )