-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce stable pods #478
Conversation
by specifying MinReadySeconds in their PodSpec (w/o updating the ReplicaSet pod template). | ||
|
||
## kubelet changes | ||
For a Pod tha specifies MinReadySeconds, kubelet will need to check (after MinReadySeconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think in the absence of min ready seconds, available condition would just map with ready, no? i would rather all things report a condition instead of just a few...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, will update.
|
||
Both problems above can be solved by moving MinReadySeconds in the PodSpec. Once kubelet | ||
observes that a Pod has been ready for at least MinReadySeconds without any of its containers | ||
crashing, it will update the PodStatus with an Available condition set to Status=True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I still ask for a better name? Is "Stable" or "StableReady" or "ReadyStable" problematic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stable sgtm
// without any of its container crashing, for it to be considered available. | ||
// Defaults to 0 (pod will be considered available as soon as it is ready) | ||
// +optional | ||
MinReadySeconds *int32 `json:"minReadySeconds,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correspondingly "StableReadySeconds", or Min... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton @bgrant0607 thoughts? This makes sense if we roll with "Stable" but this entails that we then need to update the status fields in our controllers from AvailableReplicas to StableReplicas (something to keep in mind for v2 I guess at this point)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? You don't need to use the same name everywhere - names are contextual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to convey the same thing via different names is at least confusing to me as a user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You must be "Ready" for "X Seconds" in order to be "Stable". "StableAfterReadySeconds" is awkward but is somewhat closer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I still prefer MinReadySeconds ("Minimum Seconds the pod needs to be Ready") but I won't hang on it if you guys feel strong about changing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that I love any of the new names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SecondsRequiredStableBeforeReady?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidopp isn't an api convention to follow the *Seconds pattern? I think it is but I may be wrong.
|
||
## kubelet changes | ||
For a Pod tha specifies MinReadySeconds, kubelet will need to check (after MinReadySeconds) | ||
if any of the Pod containers has crashed. If not, it will switch the Available condition to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this flip to false every time a container crashes, or is it latched to true once it is achieved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after the initial period, if a container has crashed, how long does it wait? We don't have enough info (I think) to know WHEN it crashed. Maybe restate this as kubelet will reset the timer every time a container crashes until this condition is achieved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really EVERY container? What if it is OK for my helper container to crash occasionally? Reality is that it happens...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this to be similar to the Ready condition. If a Pod is Ready and then one of its containers crashes, the Pod stops being Ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's true, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Stable is just a longer Ready.... why not increase the amount of time before you are Ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question, today MinReadySeconds is just that, an additional delay on top of Ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the answer is that longer wait for Ready means you are not receiving traffic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you need to take traffic before you know whether you're ok. If you have a liveness or readiness check that also checks your business metrics, you can use that to signal a problem.
- created
- check readiness (no traffic, so no errors in your metrics)
- take traffic, users getting 500's
- check readiness (got errors, fail)
- go out of rotation
- ???
What happens at step six? You just oscillate in and out, causing some percentage of users to get failures? In openshift we can fail the deployment config, and I guess Deployments can hold the deployment and backoff creation. Stability would be false, since we're oscillating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there seem to be two types of errors an application can have thus a 500 that a user will see: 1) those that will make the application exit. In this case, the pod goes out of rotation anyway, it never transitions to Stable, the Deployment blocks. We can simply verify these by looking at the container statuses. 2) Bugs that may not cause the application to exit. Pod is not going out of rotation. Deployment still blocks. Can't look at container statuses, readiness checks may not be enough unless business metrics are involved.
Both cases are covered by rerunning the readiness check I guess. I need to call this out. Does it also mean that the endpoints controller will need to remove endpoints for Pods that never reach Stable?
6 does not seem to be something that needs to be handled in this proposal. We have ProgressDeadlineSeconds and there is AutoRollback that needs to be implemented.
For a Pod tha specifies MinReadySeconds, kubelet will need to check (after MinReadySeconds) | ||
if any of the Pod containers has crashed. If not, it will switch the Available condition to | ||
Status=True in the status of the Pod. Pods that don't specify MinReadySeconds, won't have | ||
the Available condition set in their status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most conditions are assumed to have their default value == the Go zero value, and are named as such, I think. Brian has better context on this, but it seems weird to me that nonexistent implies true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, similar to Derek's comment, we probably want to add the condition to all Pods and default to false. I was thinking though that such an action would be disruptive but then again in order to upgrade a Node you need to drain it first so I am probably wrong.
I just think it doesn't convey to a naive user what it does.
…On Thu, Mar 23, 2017 at 8:24 PM, Michail Kargakis ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In contributors/design-proposals/minreadyseconds.md
<#478 (comment)>:
> +Both problems above can be solved by moving MinReadySeconds in the PodSpec. Once kubelet
+observes that a Pod has been ready for at least MinReadySeconds without any of its containers
+crashing, it will update the PodStatus with an Available condition set to Status=True.
+Higher-level orchestrators running on different machines such as the ReplicaSet or the
+PodDisruptionBudget controller will merely need to look at the Available condition that
+is set in the status of a Pod.
+
+## API changes
+
+A new field is proposed in the PodSpec:
+```go
+ // Minimum number of seconds for which a newly created pod should be ready
+ // without any of its container crashing, for it to be considered available.
+ // Defaults to 0 (pod will be considered available as soon as it is ready)
+ // +optional
+ MinReadySeconds *int32 `json:"minReadySeconds,omitempty"`
I think I still prefer MinReadySeconds ("Minimum Seconds the pod needs to
be Ready") but I won't hang on it if you guys feel strong about changing it.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#478 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVJkQ6QwBbhdGwOqLNb62-CFY-Xvpks5rozdrgaJpZM4MnDSC>
.
|
really....I do not think that is what SHOULD happen... BUt I accept that
it might be what DOES happen.
…On Thu, Mar 23, 2017 at 1:02 PM, Michail Kargakis ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In contributors/design-proposals/minreadyseconds.md
<#478 (comment)>:
> +Additionally:
+* Deployments/ReplicaSets/DaemonSets already use MinReadySeconds in their spec so we
+should probably deprecate those fields in favor of the field in the PodSpec and remove
+them in a future version.
+* Deployments/ReplicaSets will not propagate MinReadySeconds from their Spec down to the
+pod template because that will lead in differences in the pod templates between a
+Deployment and a ReplicaSet resulting in new rollouts. If MinReadySeconds is specified
+both in the spec and pod template for a Deployment/ReplicaSet and it's not the same value,
+a validation error will be returned (tentative). API defaulting can set MinReadySeconds
+in the spec, if it's specified only in the PodTemplate (tentative).
+* ReplicaSets that specify MinReadySeconds only in the ReplicaSetSpec, can create new Pods
+by specifying MinReadySeconds in their PodSpec (w/o updating the ReplicaSet pod template).
+
+## kubelet changes
+For a Pod tha specifies MinReadySeconds, kubelet will need to check (after MinReadySeconds)
+if any of the Pod containers has crashed. If not, it will switch the Available condition to
I just double-checked because it felt for a moment like I am crazy but
yes, this is true. Once a container crashes, the Ready condition switches
back to False. Of course, if the container is restarted and comes back to a
healthy state the condition will transition back to True and then you end
up crashlooping.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#478 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVJjQk1DRAat6C-n4Fol5MjgoCIbiks5ros_dgaJpZM4MnDSC>
.
|
I remember that this didn't use to be the case but you don't want to continue sending traffic in broken Pods is my understanding of it. Furthermore, it may be just your sidecar that is failing, but there is no way today to separate between the main and the sidecar via the API... Or at least not a way I am aware of. |
Well, we control the API, so we could have added a flag indicating whether
a particular container should flip "ready" or not.
…On Thu, Mar 23, 2017 at 9:21 PM, Michail Kargakis ***@***.***> wrote:
really....I do not think that is what SHOULD happen... BUt I accept that
it might be what DOES happen.
I remember that this didn't use to be the case but you don't want to
continue sending traffic in broken Pods is my understanding of it.
Furthermore, it may be just your sidecar that is failing, but there is no
way today to separate between the main and the sidecard via the API... Or
at least not a way I am aware of.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#478 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVMU2psOGGie9yJDGbs0bODPDYyPyks5ro0TegaJpZM4MnDSC>
.
|
Yes, I believe it's reasonable. Today effectively all containers are
affecting Ready by default. By adding that flag you can block switching
Ready (and Stable) back n forth :)
On Fri, Mar 24, 2017 at 12:23 AM, Tim Hockin <notifications@github.com>
wrote:
… Well, we control the API, so we could have added a flag indicating whether
a particular container should flip "ready" or not.
On Thu, Mar 23, 2017 at 9:21 PM, Michail Kargakis <
***@***.***>
wrote:
> really....I do not think that is what SHOULD happen... BUt I accept that
> it might be what DOES happen.
>
> I remember that this didn't use to be the case but you don't want to
> continue sending traffic in broken Pods is my understanding of it.
> Furthermore, it may be just your sidecar that is failing, but there is no
> way today to separate between the main and the sidecard via the API... Or
> at least not a way I am aware of.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#478 (comment)
>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/
AFVgVMU2psOGGie9yJDGbs0bODPDYyPyks5ro0TegaJpZM4MnDSC>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#478 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADuFf2LroIUfJmVYGE4CXdsPseV3qRtxks5ro0VTgaJpZM4MnDSC>
.
|
// without any of its container crashing, for it to be considered available. | ||
// Defaults to 0 (pod will be considered available as soon as it is ready) | ||
// +optional | ||
MinReadySeconds *int32 `json:"minReadySeconds,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SecondsRequiredStableBeforeReady?
``` | ||
and a new PodConditionType: | ||
```go | ||
// PodAvailable is added in a ready pod that has MinReadySeconds specified. The pod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second sentence of this comment is confusing. BTW is it worth stating somewhere in this doc how Ready vs Available are treated by services?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to rephrase this one. Also added a sentence where I explicitly mention that Services are not going to be affected by this change.
|
||
## Future work | ||
The PDB controller will need to be extended to recognize the Available condition in Pods. | ||
It may also need to get into the bussiness of estimating availability for Deployments/ReplicaSets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
business
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
backwards-compatible with old kubelets. | ||
|
||
## Future work | ||
The PDB controller will need to be extended to recognize the Available condition in Pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to cite #34776 where we had some discussion of this topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Addressed most the comments, ptal Still tentative is the name in the PodSpec. I still prefer MinReadySeconds over all current suggestions because 1) it's less ugly and 2) it's already used by three different controllers (so people should already be familiar with it). |
cc: @mfojtik since somebody from PM may need to drive the implementation :) |
@kubernetes/sig-node-api-reviews |
2. PodDisruptionBudget is working with Ready Pods and has no notion of MinReadySeconds | ||
when used by workload controllers. | ||
3. The workload controllers cannot take advantage of readiness checks when checking the | ||
state of their underlying Pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another problem I found while debugging is that it's very hard to reason about which pods are not available from a controller, because it only reports aggregated data. Imagine a ReplicaSet with 200 replicas, and only 3 of them are unavailable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Updated the proposal.
the PodSpec if it is specified either in their Spec or PodTemplateSpec. The value in the | ||
PodTemplateSpec should stomp the value in the Spec. For Pods that do not specify | ||
MinReadySeconds (those running on old kubelets), the controllers will continue to use the | ||
current approach for estimating stability. Eventually, we should switch all controllers that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make PodTemplate.Spec.MinReadySeconds default to whatever is set in the controller spec when creating pods (we don't need to touch PodTemplate so won't cause additional rollouts)? It overwrites controller Spec.MinReadySeconds only when it's set (not nil
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I mean by "The ReplicaSet and DaemonSet controllers will create new Pods by setting MinReadySeconds in the PodSpec if it is specified either in their Spec or PodTemplateSpec. The value in the PodTemplateSpec should stomp the value in the Spec."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotya
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also mention it above in the API changes section.
|
||
Question: Do we need a separate kind of probe (stabilityProbe?) or should we re-use | ||
the readinessProbe? What if a user needs to take into account different kinds of metrics | ||
when doing the stability check than those used in her readiness probe? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reusing readinessProbe makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
@wanghaoran1988 is going to start working on the kubelet side of things |
Pods can be created using it without updating the controller's PodTemplateSpec. | ||
|
||
### kubelet changes | ||
For a Pod that specifies MinReadySeconds, kubelet will need to rerun the readiness check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kargakis Why we need rerun the readiness check? Can we just wait "minReadySeconds" seconds after it's ready and check if the pod is still ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that ideally that check would be able to identify failures that are not caught by readiness (by using metrics exposed by the app), so we may want to think about introducing a net-new probe. @kubernetes/sig-node-proposals @janetkuo @smarterclayton Or do you guys think re-running the readiness check is enough?
I'm not able to move this forward anymore so I am going to close. |
@Kargakis why are you closing this? |
|
FWIW I think this is a good idea in general and we should try to implement it, or something like it, eventually. |
concur with @davidopp |
I think this is still a good idea and want to share some context. I'm also interested if there are any related developments elsewhere that anyone can link me to or if we think the right SIGs/contributors know about this suggestion? At Yelp we've had one form or another of this problem since before we even used kubernetes! I would describe it like so: "When we launch a new container it takes some time to be discoverable in whatever service mesh technology you choose. In large clusters, proxies or iptables rules or whatever might have to be updated on many nodes or inside many pods. If you remove old versions of the container before this happens your service will be under capacity and likely to send errors". Before kubernetes we had our own deployment code that checked a sample of our hosts to check that the new containers had converged in our mesh before the deployment proceeded to kill the old containers. In kubernetes we "abuse" the readiness check by doing a similar check of our service mesh before saying a Pod is ready. We could alternatively use min ready seconds but in cases where our mesh infrastructure goes wrong or is really slow for some reason this timer would expire and rather than a stalled deployment we have a service outage. I think in general people would prefer a stalled deployment. So I do think a second condition is needed for the deployment controller (or replica set controller?) to know when it's safe to proceed. I think it should also be possible to define a separate probe because I think it's dependent on your choice of service mesh as to what you should check before proceeding. I don't know enough about kubernetes internals to comment with authority but it feels like it could be possible in a backwards compatible way where the deployment controller falls back to depending on readiness condition when this new condition isn't defined? And then users have a choice between min ready seconds which defines this new condition or they can define a custom check that overrides that? |
We have this notion of pod readiness gates - which allows a 3rd party to
offer an opinion about whether a pod is ready, so the higher-order
controllers can proceed. Does that get to what you are describing?
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-readiness-gate
…On Wed, Sep 21, 2022 at 8:27 AM Matthew Mead-Briggs < ***@***.***> wrote:
I think this is still a good idea and want to share some context. I'm also
interested if there are any related developments elsewhere that anyone can
link me to or if we think the right SIGs/contributors know about this
suggestion?
At Yelp we've had one form or another of this problem since before we even
used kubernetes! I would describe it like so: "When we launch a new
container it takes some time to be discoverable in whatever service mesh
technology you choose. In large clusters, proxies or iptables rules or
whatever might have to be updated on many nodes or inside many pods. If you
remove old versions of the container before this happens your service will
be under capacity and likely to send errors".
Before kubernetes we had our own deployment code that checked a sample of
our hosts to check that the new containers had converged in our mesh before
the deployment proceeded to kill the old containers. In kubernetes we
"abuse" the readiness check by doing a similar check of our service mesh
before saying a Pod is ready. We could alternatively use min ready seconds
but in cases where our mesh infrastructure goes wrong or is really slow for
some reason this timer would expire and rather than a stalled deployment we
have a service outage. I think in general people would prefer a stalled
deployment. So I do think a second condition is needed for the deployment
controller (or replica set controller?) to know when it's safe to proceed.
I think it should also be possible to define a separate probe because I
think it's dependent on your choice of service mesh as to what you should
check before proceeding. I don't know enough about kubernetes internals to
comment with authority but it feels like it could be possible in a
backwards compatible way where the deployment controller falls back to
depending on readiness condition when this new condition isn't defined? And
then users have a choice between min ready seconds which defines this new
condition or they can define a custom check that overrides that?
—
Reply to this email directly, view it on GitHub
<#478 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVH4NBCPNJ3YBRVH6UDV7MSNPANCNFSM4DE4GSBA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I think probably it doesn't (but certainly possible I misunderstand something). If both the deployment controller and the service/endpoint controller both wait for all the conditions to be ready then we still have the same problem. Hypothetically if we could tell a deployment to only proceed with one readiness gate and a service to only mark an endpoint ready on a different readiness gate then I think that would probably work. But I don't think that's currently possible? I actually can't find the documentation that says when an endpoint is added/removed from a service... |
On Wed, Sep 21, 2022 at 9:34 AM Matthew Mead-Briggs < ***@***.***> wrote:
I think probably it doesn't (but certainly possible I misunderstand
something). If both the deployment controller and the service/endpoint
controller both wait for all the conditions to be ready then we still have
the same problem. Hypothetically if we could tell a deployment to only
proceed with one readiness gate and a service to only mark an endpoint
ready on a different readiness gate then I think that would probably work.
But I don't think that's currently possible? I actually can't find the
documentation that says when an endpoint is added/removed from a service...
Endpoints are added when the corresponding pod is ready, as indicated by
the Ready condition. I think I see what you are asking for - you want pods
to be included in the service when they are passing readiness probes, but
you don't want rollouts to proceed until that service is confirmed
"healthy" in some statistical manner. Is that right?
One way to do that would be to set `Service.spec.publishNotReadyAddresses`
but that is not REALLY what you want.
The problem is that readiness gates are material to the pod Ready
condition. We can't really add a pod to a Service before Ready (e.g. as
soon as readiness probe passes), because we don't know what readinessGates
might actually mean.
So what you'd really need is something in ReplicaSet (and Deployment, which
configures RS) which says "which pod condition should I look at to
determine pod-ready" (the proverbial "another level of indirection"). Then
you could set that to a "service is happy" indicator, which you can set by
your own heuristics. Looking at the code for RS controller, that actually
seems quite simple to implement, but I am not sure what other
considerations are required - that subsystem is not my expertise. We would
need a sig-apps KEP, for sure, though it should be pretty short
(relatively).
Tim
|
Yep exactly, and indeed I think it depends on what service mesh or routing people use as to what that check would be. But the overall problem in my mind is generic to any mesh technology
I think that sounds pretty sensible and would work well for our case. I'll have some discussions about this, I'm not sure how urgently we need to fix our current approach but I suspect we'll have to tackle this eventually so would be nice to get it described in a KEP and contribute back if we can. |
On Wed, Sep 21, 2022 at 10:48 AM Matthew Mead-Briggs < ***@***.***> wrote:
Endpoints are added when the corresponding pod is ready, as indicated by
the Ready condition. I think I see what you are asking for - you want pods
to be included in the service when they are passing readiness probes, but
you don't want rollouts to proceed until that service is confirmed
"healthy" in some statistical manner. Is that right?
Yep exactly, and indeed I think it depends on what service mesh or routing
people use as to what that check would be. But the overall problem in my
mind is generic to any mesh technology
So what you'd really need is something in ReplicaSet (and Deployment, which
configures RS) which says "which pod condition should I look at to
determine pod-ready" (the proverbial "another level of indirection"). Then
you could set that to a "service is happy" indicator, which you can set by
your own heuristics. Looking at the code for RS controller, that actually
seems quite simple to implement, but I am not sure what other
considerations are required - that subsystem is not my expertise. We would
need a sig-apps KEP, for sure, though it should be pretty short
(relatively).
I think that sounds pretty sensible and would work well for our case. I'll
have some discussions about this, I'm not sure how urgently we need to fix
our current approach but I suspect we'll have to tackle this eventually so
would be nice to get it described in a KEP and contribute back if we can.
Awesome. I'm happy to look at a KEP, though I should not be approver for
it, probably. Honestly, it feels like one field on RS, one on Deployment,
and maybe one on StatefulSet. Then some plumbing work to pass the
indirection around and some good devex (events, logs, etc) and tests.
… Message ID: ***@***.***>
|
Replace Dan Ciruli with Jasmine Jaksic
@kubernetes/sig-apps-api-reviews @kubernetes/api-reviewers