-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Initial deployment lifecycle hooks proposal #33545
Conversation
@kubernetes/deployment ptal |
|
||
* **Retry** - using this policy will cause that in case of a deployment lifecycle | ||
hook failure, the lifecycle Pod will be restarted until the deployment reached the | ||
`TimeoutSeconds` or the hook succeeds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TimeoutSeconds is ProgressDeadlineSeconds added in the perma-failed PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will remove it and use progressdeadline
### Default values | ||
|
||
Initially the default `FailurePolicy` should be set to `Retry` until the deployments can | ||
safely be rolled back automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to the issue regarding automatic rollback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
type RollingUpdateDeployment struct { | ||
// TimeoutSeconds is the time to wait for updates before giving up. If the | ||
// value is nil, a default will be used. | ||
TimeoutSeconds *int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this, ProgressDeadlineSeconds will be in the spec. You should mention that this proposal is depending on the perma-failed PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
Jenkins GCI GCE e2e failed for commit 7d193a9. Full PR test history. The magic incantation to run this job again is |
7d193a9
to
6cf98c0
Compare
Jenkins verification failed for commit 6cf98c0. Full PR test history. The magic incantation to run this job again is |
Because deployment processes typically involve coupling between state (assumed elsewhere) | ||
and code (frequently the target of the deployment), it should be easy for a hook to be | ||
coupled to a particular version of code, and easy for a deployer to use the code under | ||
deployment from the hook. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a background section here including:
- Why we did this on OpenShift and lessons learned
- How this might be applicable to pet sets
- Why this may or may not be applicable to daemon sets.
Kubernetes provides container lifecycle hooks for containers within a Pod. Currently, | ||
post-start and pre-stop hooks are supported. For deployments, post-start is the most | ||
relevant. Because these hooks are implemented by the Kubelet, the post-start hook provides | ||
some unique guarantees: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't I used an init container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this question
As mentioned on IRC it may be useful to do custom deployment strategies
(deployment executed by non core controller) first, which would allow this
to be built as an extension controller (using observe or higher level
tools) that applied hooks before / after deployment. I think process
driven deployments are something it should be possible to build easily, and
this is really a variant of a custom deployment.
If you can spawn a side proposal only covering how someone could define a
custom strategy we could have a multistage approach to hooks which makes
Kube more extensible and reduces risk in the design process.
|
Ref #31571 re. custom controller. I like that approach, esp. in the short term (1.5/Q4). Review bandwidth is extremely limited right now. Also, I think hooks are more likely to be relevant to PetSet than to Deployment. Deployment is really aimed at stateless applications with lots of rollout flexibility. |
relevant. Because these hooks are implemented by the Kubelet, the post-start hook provides | ||
some unique guarantees: | ||
|
||
* The hook is executed synchronously during the pod lifecycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PostStart is async, unless I'm missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgrant0607 PostStart is definitely synchronous. According to the docs and my testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, it can start the entry point and the hook at the same time, but the hook has to finish running before a container is considered to be started
* The status of the pod is linked to the status and outcome of the hook execution. | ||
* The pod will not enter a ready status until the hook has completed successfully. | ||
* Service endpoints will not be created for the pod until the pod has a ready status. | ||
* If the hook fails, the pod's creation is considered a failure, and the retry behavior is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think pod and container are confused here. Post-start is at the container level.
``` | ||
|
||
```go | ||
type ExecNewPodHook struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not a distinct pod or job spec?
Re: need for hooks
Right now 50-60% of our ootb web app examples that use a DB use the hook to
initialize schema, so very relevant from the "I want to build something
self contained that dtrt". Somewhere upwards of 70% of the ruby apps
running on openshift online use the hook as a rails migration step.
In enterprise environments, we've seen hooks used extensively with JEE apps
to run mid migration updates with recreate strategies (clear cache, etc).
I would those two examples are the most critical for ease of use in the use
cases we target, but both of those examples benefit because of openshift's
triggered deployments (via docker builds or tagged images resulting in
images being changed automatically). If you don't have that capability,
you are likely using custom scripting or Jenkins to drive your deployment
updates, and you probably have a place to insert the hook logic.
So a real philosophical question is - if you don't have a development flow
that is innate / integrated with Kube, you'll be able to do hooks if you
want. The hooks being on deployments gives a declarative tool for running
something like pre-start but at a larger logical level for anyone
performing deployment spec mutations - is that innately valuable to
everyone, or just those integrated flows.
I'd like to demonstrate a generic trigger controller (annotation that can
result in a mutation of a deployment / daemonset / petset based on the
latest level state of some external resource like a tagged image or git
repo or URL) but I'm trying to close the partial object retrieval story
first so I can do it more efficiently. But even that sort of tool is not
fundamental, since many other workflows exist. The reason we focus on
triggers is because they emphasize periodic refresh of cluster state (new
db image) without user intervention, but without failed deployments those
triggers could push resources into failing states. But as we talk about
this a lot it really does highlight how OpenShift's dev workflow is
opinionated and reinforcing (image resources -> triggers -> hooks -> failed
deployments). If each of those pieces hangs on the other, it makes each
individual piece in isolation less valuable. The fact we run the
deployment in a pod also reinforces that - it's easy for someone to tweak
the default deployment logic because we already have the concept of custom
code running on the users' behalf in a pod.
Perhaps we need to assess how someone like OpenShift can correctly
implement the opinionated, reinforcing, easy deployment workflow on top of
deployments first (which goes to the custom deployment question). That's
more "custom workflow on top of Kube". While that complicates the movement
of openshift users to deployments, it empowers others to build their own
workflows.
|
@smarterclayton Thanks much for the detailed background. Do the web-db cases put the DB in the same container/pod as the web app? |
They typically do not - schema and "execution" logic come from web app, db On Oct 6, 2016, at 10:44 PM, Brian Grant notifications@github.com wrote: @smarterclayton https://github.com/smarterclayton Thanks much for the Do the web-db cases put the DB in the same container/pod as the web app? — |
ref #14512 |
Adding label:do-not-merge because PR changes docs prohibited to auto merge |
[APPROVALNOTIFIER] Needs approval from an approver in each of these OWNERS Files: We suggest the following people: |
This PR hasn't been active in 90 days. Closing this PR. Please reopen if you would like to work towards merging this change, if/when the PR is ready for the next round of review. cc @brendandburns @erictune @mfojtik @thockin You can add 'keep-open' label to prevent this from happening again, or add a comment to keep it open another 90 days |
@smarterclayton @Kargakis
This proposal is a summary collected from various sources.
A PoC branch (that is heavily an WIP): https://github.com/mfojtik/kubernetes/tree/deployment-hooks
This change is