-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canarying mechanism for ConfigMap #20200
Comments
I asked for something here too, but it was deferred at the time. I proposed that the ConfigMap hold 2 maps, an indicator of which map is current and which one is "new", and an int field with values 0-100000, representing mili-percents (or maybe use a real power of 2 for easier clients). Consumers who want to canary can generate a random number in this range. If their number is < the milli-percentage, they use the "new" map, else they use the "current" map. To canary, you push the new map and slowly ramp the percentage. When you hit 100% you flip the pointer and reset the percentage. This could be baked into the ConfigMap volume. Obviously this is pushing the bounds of ConfigMap, and maybe should be done in a different way. |
While it kinda solves the other need I was going to mention (rate limiting how fast new values are propagated to pods), that approach has the drawback that it is not very deterministic. It might take a while before any instance canaries the new setup. It's hard to tell who is using it and who isn't. Service health needs to be correlated with the milli-percent. What's worse, if a new configuration needs to be pushed in the middle of a large outage to try to get the service back on its feet, it'd be hard to evaluate if it's working. Maybe a config controller could add metadata to the configMap. It could be set up to look at a service or set of labels to identify the candidate pool. Then it writes a "canary" field with the name(s) of the X lucky pod(s) — as many as the user requested. On the other side, the kubelets for the chosen ones reload the configuration (all others just wait... the question then becomes of what happens if new pods go live during this canarying phase. Maybe we can't escape having two different configurations.) If the canaries are still healthy after Y minutes, the controller updates the "canary" field with new pods, with up to Z new ones in flight at any time (e.g. 5% or 10% of total pods), keeping track of health checks during the whole process. Kubelets might also report which version of the ConfigMap is active for a pod. This would be opt-in, of course. |
Alternatively, make a new ConfigMap and do a controlled rollingupdate of On Wed, Jan 27, 2016 at 8:39 AM, Rudi C notifications@github.com wrote:
|
I agree with @thockin's last proposal. The right thing to do here is create a new ConfigMap and do a rolling update to switch to it, using the new Deployment API. |
To be more clear: That's the recommended solution for the foreseeable future. If it won't work for you, please explain why and reopen the issue. |
But the rolling update would restart all pods, correct? Unless we get smarter updates that, with cooperation of pods, can just reload data. |
Ref #9043 re. in-place rolling updates. |
A bad configuration file can take down a whole service. Is the expectation that problematic ConfigMaps will get caught by the dev/staging/production promotion process? They're namespaced, i think. I can come up with the example of a very spectacular postmortem from 2007 where even that was not enough.
Would there be merit in a canarying mechanism? I'm assuming that currently configurations are pulled, not pushed, driven by watches. That's less than ideal, but I think one could have an election held where all potential "victims" pick the lucky one to try the new settings and, later, report on success. Even if this doesn't become a built-in feature, the documentation should at least point to the problem, best practices or possible options.
The text was updated successfully, but these errors were encountered: