-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to wait for conditions from kubectl and using the API #1899
Comments
Also @ghodss @bgrant0607 |
@smarterclayton What is described here, specifically in the CLI examples, speaks to the various individual pieces, but is there a need to wait for a group of related resources to achieve a desired state? Examples: a service, pod, controller or 2 different pods that work in conjunction with one another. |
I doubt that's the first thing we would need. In many cases you can just wait linearly (wait X, wait Y). Can you describe some concrete examples of multipod coordination that waiting would be needed on?
|
@smarterclayton I'm thinking of the case were I apply a config and I want to wait until the results of that operation are 'ready'. The only other example I can think of for multipod is maybe a messaging system of some kind. |
Config is special, because config is just applying the same action to each individual component, or defining your own ready state at the end. If you need multi step behavior, you already implicitly need a way to describe sequential stepwise logic, and that is (in the short term) where you punt to shell, or come up with a way to express readiness more simply in your app
|
I'm supportive of building in a mechanism to probe container/pod readiness (#620), similar to liveness. This information is needed by a wide variety of systems and tools, including services and perhaps replication controllers. Similarly, we'll need a way to aggregate per-pod readiness for sets of pods identified by a label selectors. This perhaps could be returned in service and/or replication controller status. Requirements such as requiring N instances to be ready are also very common. At least all systems that cause disruptions (e.g., rolling updates) need to be aware of them. I'd express this as an independent disruption policy with a label selector. Because an absolute N isn't very friendly to auto-scaling, there should also be ways to specify a percentage ready or max not ready. I can also see the utility of waiting for a variety of conditions in the client, including various flavors of readiness and also termination. Is that format template syntax some standard or made up? |
The template listed is a mode of output using golang templates from kubecfg and kubectl - I'm not sure it's the best tool but I doubt we want to invent a custom query syntax. The template is probably how you'd script this today in bash without requiring an external tool like jq. Having the server support simple label query conditions for readiness would simplify common clients, at the expense of flexibility. Would prefer never to wait on the server of course.
|
I wonder if this can be broken out into its own command instead of building it into create, get, etc. It may not be quite as convenient but I think the wins in simplifying the kubectl interface, the implementation, and providing more cohesive building blocks for people's scripts may be worth it.
Maybe you could also define custom s in plugins or config files. WDYT? |
Why could we not provide both? Waiting for the condition (or not) with On 10/21/2014 11:29 PM, Sam Ghods wrote:
Jeff Cantrill |
We could. I still think there's a tradeoff in making the interface, documentation, etc. simpler if they're separate commands that can be chained. At the very least, we could start with just a wait subcommand and then add it into the other subcommands if there's enough need. |
I tend to prefer the flag syntax for its user-friendliness but I agree with the arguments around cohesion and simplicity (of code) of a separate If we decide to go that path we should rely on a syntax pattern for |
Also consistency around stdin. I think other places use -f - for stdin, I would suggest something other than - here, since that is used in legit On Thu, Oct 23, 2014 at 4:47 PM, Fabiano Franz notifications@github.com
|
+1 for pod:running or pod=running. I think it also reads better and better implies what thing in what desired state
|
Trying to summarize the context of this discussion: Possible Syntax:
Possible Examples:
• Does not seem to make sense to support 'wait' for all resource types |
Your minimum/exactly syntax is ad hoc. I dislike ad-hoc syntax - it's What if we instead consider the Pod status codes here: pending, running, A ReplicationController is pending as long it has not met its N. Now that I spelled it out, I don't like it so much. The alternative is an On Fri, Nov 7, 2014 at 8:50 AM, Jeff Cantrill notifications@github.com
|
I like your line of reasoning Tim about several different object having On Fri, Nov 7, 2014 at 9:01 AM, Tim Hockin notifications@github.com wrote:
|
When I did some of the original conditional waiting in the code (pkg/client/conditions.go) it was obvious there were categories of waits that were common that you could easily agree "yes this is a valid thing to wait for". They weren't truly generic, but they potentially could be. I think service readiness is the complex part. Service readiness might be a concept we're able to discuss. Could we define a readiness check type on a service that looks (but is not exactly) like what containers have? Needs X, needs Y, etc? Key difference is services transition between ready and not ready. ----- Original Message -----
|
----- Original Message -----
There are likely resources that have no meaningful wait. Each resource should justify whether it can wait.
Status implies that a resource has a desired state and a delta state, so it does seem that anything with Status is implicitly waitable.
Can you define what you mean by this? To me, wait should be "is this condition met" and "here's the maximum I'll wait" - nothing else.
|
The problem with these condition codes is that they are very coarse, and If a replication controller is "pending" it means either it has not started And what does it mean for a service to be ready? Some number of On Fri, Nov 7, 2014 at 9:09 AM, Eric Tune notifications@github.com wrote:
|
On 11/07/2014 12:24 PM, Clayton Coleman wrote:
I mean here more of something at would be internal to the
Jeff Cantrill |
----- Original Message -----
Ok. Being aware of the state machine of the resource type is required - if you ask for "Running" but are in "Failed" you're right, you should stop. If you ask for "Failed" and you're in "Running", you keep waiting. While that couples us to knowing about the state machine, we specifically designed it to be simple enough to do this.
|
The Smith Resource Manager was presented at Sig-Apps on Jul 24th. Smith handles readiness, for creation-order-dependcies, by having a function for each resource type. For TPRs you can specify a field, which, if present, indicates readiness. @ash2k hope I got that somewhat right. |
@erictune yes, that is correct. More information is in the readme https://github.com/atlassian/smith Please note that not all object kinds are supported right now but it is trivial to add support. Also there are some other limitations (see issues). |
A note of why this is so difficult to get right and should be on the
serverside - Smith's deployment readiness is wrong because you can have the
correct number of pods but not be within the availability condition (i.e.
maxUnavailable can be violated and smith will report ready). Given the
complexity of these interactions, I think it's untenable to expect client
authors to get this right everytime.
…On Mon, Jul 24, 2017 at 9:49 PM, Mikhail Mazurskiy ***@***.*** > wrote:
@erictune <https://github.com/erictune> yes, that is correct. More
information is in the readme https://github.com/atlassian/smith
Please note that not all object kinds are supported right now but it is
trivial to add support. Also there are some other limitations (see issues).
Anyone interested is welcome to contribute to the project - open
issues/PRs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1899 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_pz8Vz_kA43eZI6iwQ4mBPZ_1wuLoks5sRUnBgaJpZM4CwvMG>
.
|
And most people get this wrong, even people who are fairly familiar with
deployments :) (no knock on Smith)
On Tue, Jul 25, 2017 at 12:55 AM, Clayton Coleman <ccoleman@redhat.com>
wrote:
… A note of why this is so difficult to get right and should be on the
serverside - Smith's deployment readiness is wrong because you can have the
correct number of pods but not be within the availability condition (i.e.
maxUnavailable can be violated and smith will report ready). Given the
complexity of these interactions, I think it's untenable to expect client
authors to get this right everytime.
On Mon, Jul 24, 2017 at 9:49 PM, Mikhail Mazurskiy <
***@***.***> wrote:
> @erictune <https://github.com/erictune> yes, that is correct. More
> information is in the readme https://github.com/atlassian/smith
>
> Please note that not all object kinds are supported right now but it is
> trivial to add support. Also there are some other limitations (see issues).
> Anyone interested is welcome to contribute to the project - open
> issues/PRs.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#1899 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABG_pz8Vz_kA43eZI6iwQ4mBPZ_1wuLoks5sRUnBgaJpZM4CwvMG>
> .
>
|
Would be fantastic to have this for running one-shot database migration jobs that need to run to completion before doing a rolling update of a web app. Guess I'll go hack up a poll/wait loop in the meantime ;-) |
I know this is an old one, but here's another upvote. I've found another case where this would be quite useful: running conformance tests inside a cluster, it would be nice to know when the job has completed. |
A related issue is that we don't make it easy to determine success or failure of a change, much less wait for it: #34363 |
Bumping this |
Damn this issue was created in 2014. I need to wait for a Container to be in the state Running, will return to a shell loop watching the state. |
@zoobab also you probably want to check for the numbers |
One more example why kubectl is hard to integrate into automation scripts. Imagine that you run:
And the < image > does not exist in the registry for some reason. This is a very common case. If you have a bash script running this command, the while loops in bash script will not help you because The only workaround for that if you use bash is to fork the asynshronous process from your script JUST to check for the pod status in a smart enough way. Kubernetes is an awesome product, but it's unfriendliness to be maintained from automation scripts is a bit strange. Infinite kubectl timeout, impossibility to wait on entity status - all these minor things makes Kubernetes much less automation friendly that it could be. |
|
Either i am doing something wrong or i misunderstood the concept behind I use kubectl wait in my gitlab ci pipeline. Stage 1: Deploy my app In Stage 2 i wanted to use When the deployment is not ready and no pod can be found with a labels "release=branch123" and "pod-name=php-nginx" I hope my request is clear. Maybe i am just doing something wrong here? |
@chucky2305 Yeah, unfortunately it instantly returns if the selector does not match (I made a bug report for that #66456). Basically, it will only wait if that resource exists. It will not wait on nonexistent resources. What you could do (I believe) is put those labels on the deployment, and wait until the deployment has a condition of available. The benefit of this is, since the deployment does exist, it won't instantly return. So your command would be:
|
For fellow noobs from google wondering how to wait for a deployment update,
|
Most of what was mentioned above seemed to have been accomplished, the only thing that I'm still finding myself using grep is for checking pod's phase. Not sure how it can be done purely using kubectl without the need for grep. |
What about |
…no-ocp-master OCPBUGS-10996: Fix race condition between resizer and kubelet
Spawned from #1325
It should be easy for users do two things:
Readiness (#620) is a complex topic, and readiness can mean different things in different contexts. The Kubernetes client CLI and client library (see
pkg/client/conditions.go
) should provide tools for common readiness conditions and enable developers and administrators to easily script more complex readiness. This issue only covers client side readiness - server side readiness should be handled elsewhere.Readiness must have an explicit upper bound (the system may never converge) - probably manifested as a maximum timeout. Certain errors may be transient (network, server) and some fatal (resource deleted?). It should be possible for end users to understand the ways that readiness can fail and work through those conditions.
Most resources are likely to have an implicit "ready" state:
However, readiness can vary in infinitely complex ways
It should be possible for users to define their own client ready conditions via scripting (potentially outside of kubectl), as long as the tools kubectl provide a common layer for behavior.
Possible CLI examples:
Things I'd like to avoid end users doing:
for | grep
loops on output as much as possibleThe text was updated successfully, but these errors were encountered: