refactor pkg/health into more reusable pkg/probe #3695

mikedanese · 2015-01-21T21:35:18Z

This is a proposal to add a concept of Context to the HealthCheck. The Context represents the setting in which a containers health is being queried. Right now the only context used is Liveness, but it's intended that Readiness will follow shortly. The Context is used to access the correct LivenessProbe from an api.Container.

To accommodate this change, some of the HealthCheck impls have been migrated to a separate interface called ProbeRunner so they are no longer responsible for accessing the LivenessProbe directly from the api.Container.

The ultimate goal of this work is to create a mechanism for supporting readiness checks as per #620. The next steps towards this goal might be:

Add ReadinessProbe struct to api/types.go and break components shared with LivenessProbe into a separate HealthyProbe, add ReadinessProbe to api.Container, change pkg/health to work with HealthyProbe
Use ReadinessProbe checks in the kubelet to control available Endpoints.

I'll be working on the api changes next. I can either WIP this PR and append the commits or I can do that work in separate PRs. I am submitting so we can discuss whether this path is on track, or whether this is being worked on somewhere else.

thockin · 2015-01-22T18:02:48Z

I get the Check/Probe rename. I am less sure about the rest. I feel like this is forcing commonality on things that aren't really the same.

I'd rather see probes be more decoupled from kubernetes-concepts like pod. The RunInContainer could be passed in by kubelet as an implementation of util/exec.Interface or flip it around and define and receive an interface that exec implements. That wall all the pod and docker and whatever logic stays firmly in kubelet, and probes stay simple.

Then I wonder if this Context type is really worthwhile for just two instances, as opposed to just making two calls to something like RegisterLivenessProbes(NewHTTPProbe(), NewTCPProbe(), NewExecProbe()) and RegisterReadinessProbes(...). This has the side effect that the two sets of allowed probes are not necessarily the same list.

I'll chew on this a bit more, but thoughts?

mikedanese · 2015-01-22T20:26:52Z

I agree with the need to decouple checks from kubernetes concepts. A "ProbeRunner" should not assume that the probe will always be accessed through a specific location in an api object, which is currently preventing the HealthCheck from being used in contexts other than Liveness. What I would like to see is a ProbeRunner interface that accepts a HealthyProbe (like a LivenessProbe but more generic) and some struct, possibly as per this TODO, that contains the information/args required to do the check.

The current implementation of pkg/health is very tightly bound to preforming HealthChecks on a single container. I wasn't sure if that was the ultimate intention but I erred on leaving that intact as much as possible while still refactoring pkg/health to be useful for contexts other than Liveness.

I also think that Context is a terrible and overloaded word but I'm struggling to find a word that fits well. I don't think that the type is necessary.

I'm still trying to figure out how best to accomplish this.

thockin · 2015-01-22T21:34:50Z

I am on-call today, so I won't have much time to think about this very
deeply, but I'll let it simmer and spend more time on it tomorrow or maybe
tonight. I had health-checks on my radar for fugly code cleanup anyway :)

On Thu, Jan 22, 2015 at 12:27 PM, Mike Danese notifications@github.com
wrote:

I agree with the need to decouple checks from kubernetes concepts. A
"ProbeRunner" should not assume that the probe will always be accessed
through a specific location in an api object, which is currently preventing
the HealthCheck from being used in contexts other than Liveness. What I
would like to see is a ProbeRunner interface that accepts a HealthyProbe
(like a LivenessProbe but more generic) and some struct, possibly as per
this TODO
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/api/types.go#L277,
that contains the information/args required to do the check.

The current implementation of pkg/health is very tightly bound to
preforming HealthChecks on a single container. I wasn't sure if that was
the ultimate intention but I erred on leaving that intact as much as
possible while still refactoring pkg/health to be useful for contexts other
than Liveness.

I also think that Context is a terrible and overloaded word but I'm
struggling to find a word that fits well. I don't think that the type is
necessary.

I'm still trying to figure out how best to accomplish this.

Reply to this email directly or view it on GitHub
#3695 (comment)
.

mikedanese · 2015-01-22T23:03:48Z

No worries, take your time. I've tried factoring the api objects out of the pkg/health/health.go interfaces where the client passes in a Probe and a "context" (unrelated to my previous use of the word context) required to execute the Probe. This has the benefit of decoupling api types from the interfaces, isolates most of the mess in the PodeRunner implementations, and would allow a client to pass in HealthyProbes in settings other than Liveness.

bgrant0607 · 2015-01-22T23:09:41Z

Ooh, I'd love to have readiness checks. I'll take a look.

FYI, something similar also applies to nodes, though in that case we've focused on reporting the "condition" rather than specifying how it is determined:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/api/v1beta1/types.go#L551

mikedanese · 2015-01-23T00:44:00Z

It does seem similar although NodeConditionKind is not mutually exclusive. ProbeRunner could probably be adapted to work for both uses, but I wouldn't want to shoehorn it. I'll think about whether it's possible to reconcile the two. It would be nice if Node checks used pkg/health.

thockin · 2015-01-23T07:15:04Z

I spent a little time looking at the existing health check code and I feel
like it is inside out. It is trying to be plugin-ish, but it really isn't
the sort of thing I would expect to see plugins for, and it's not
abstracted enough to operate that way.

What if we backed down on the plugin-style here?

Nobody but kubelet uses pkg/health EXCEPT for client/kubelet and a few
places that use health.Status. pkg/health understand Container and Pod
concepts, but why should it? I think it would be simpler to make
pkg/health become pkg/probe, with sub-packages for exec, tcp, http. The
probe pkg would have the status codes and anything common. The individual
sub-pkgs would focus on tight, simple probe implementations with pretty
consistent interfaces (as much as possible, but not artificially so) and NO
pod or container specific tie-ins.

For example, the pkg/probe/exec.Probe type would have a method Run(exec
utilexec.Interface) (or something like that). All of the RunInContainer
goop would move to Kubelet. Then, anyone who wanted to do a probe
operation (and there are a couple that are currently open coded to
net/http) would be able to create a pkg/probe/{kind}.Probe() and call Run()
on it.

pkg/probe would NOT treat probes as opaque plugins. To do that requires a
structure that can be probed against, which requires all probes to be
available everywhere and that just doesn't make sense. Maybe kubelet can
do that, but it exists within kubelet only. Simpler might be to just code
it out directly. Something like:

func (kl *Kubelet) runProbe(probe *api.Probe) health.Status {
    if probe.Exec != nil {
        return allProbes["exec"].Probe(newExecInContainer(pod.UID,
pod.Spec.Containers[c]))
    }
    if probe.HTTP != nil {
        return allProbes["http"].Probe(pod.Status.PodIP, probe.HTTP.URL)
    }
    if probe.TCP != nil {
        return allProbes["tcp"].Probe(pod.Status.PodIP, probe.TCP.Port)
    }
}

Then you can use the same structure for both liveness and readiness, and
the same code to process it, but that logic lives inside Kubelet.

I've rambled quite a bit, so I'll hand the mic back to you :).  I see this
as a few days to a week of work, probably.  I'm not going to force that on
you just to get readiness done, but the more we pile stuff on top of the
existing mess, the harder it will be to disentangle.

Tim

On Thu, Jan 22, 2015 at 4:44 PM, Mike Danese <notifications@github.com>
wrote:

> It does seem similar although NodeConditionKind is not mutually exclusive.
> ProbeRunner could probably be adapted to work for both uses, but I wouldn't
> want to shoehorn it. I'll think about whether it's possible to reconcile
> the two.
>
> --
> Reply to this email directly or view it on GitHub
> <https://github.com/GoogleCloudPlatform/kubernetes/pull/3695#issuecomment-71128910>
> .
>

mikedanese · 2015-01-23T17:34:14Z

I agree that pkg/health should be rethought. Readiness is something I'll be working towards over the next couple weeks but it's low enough priority that I wouldn't put off doing any cleanup I can along the way. I'll think about your suggestions and work on a first pass at this today.

mikedanese · 2015-01-23T21:48:11Z

@thockin I've taken a first pass at what I think a pkg/probe could look like and I've tried to address your concerns. PTAL and let me know if I'm on track. I can add tests and move things over that depend on pkg/health if you prefer this.

bgrant0607 · 2015-01-24T08:24:12Z

pkg/kubelet/probe.go

+	"github.com/golang/glog"
+)
+
+func (kl *Kubelet) makeLivenessProbeRunner(podFullName string, podUID types.UID, status api.PodStatus, container api.Container) probe.ProbeRunner {


I'd just pass in LivenessProbe and any other information from the container needed (e.g., name). That would make it easier to reuse this for readiness probes.

We should change the name of type LivenessProbe to Probe.

mikedanese · 2015-01-25T22:54:37Z

@bgrant0607 rather than renaming LivenessProbe, we could break the Actions into a Probe type and maintain {Context}Probes that hold a Probe and context specific settings. e.g.

type Probe struct {
    HTTPGet   *HTTPGetAction   `json:"httpGet,omitempty"`
    TCPSocket *TCPSocketAction `json:"tcpSocket,omitempty"`
    Exec      *ExecAction      `json:"exec,omitempty"`
}

type LivenessProbe struct {
    Probe               `json:",inline"`
    InitialDelaySeconds int64 `json:"initialDelaySeconds,omitempty"`
}

type ReadinessProbe struct {
    Probe              `json:",inline"`
    TimeoutSeconds     int64 `json:"timeoutSeconds,omitempty"`
    FrequencySeconds   int64 `json:"frequencySeconds,omitempty"`
    UnhealthyThreshold int64 `json:"unhealthyThreshold,omitempty"`
    HealthyThreshold   int64 `json:"healthyThreshold,omitempty"`
}

I'm not sure if this is over thinking it as most settings could make sense for most contexts, but I can see a few benefits to this approach. We could add configs to ContextProbes as they become implemented (i.e. we wouldn't have config fields that do things in some spots and are unimplemented in others). We could also have Context specific settings. We could also rename LivenessProbe to Probe and break out ContextProbes later on if needed. Either way, I'm happy to work on these api changes. I will probably do this in a separate PR as this one is already looking large.

mikedanese · 2015-01-26T23:24:36Z

@thockin code and tests are moved over from pkg/health. Sorry about the size. I tried to break it up logically by commit. PTAL.

thockin · 2015-01-27T06:10:24Z

pkg/probe/exec/exec.go

+	if err != nil {
+		return probe.Unknown, err
+	}
+	if strings.ToLower(string(data)) != defaultHealthyOutput {


Shouldn't we just be checking for a zero exit status?

My only concern with this is that we would be changing the behavior of the LivenessProbe api (although it's currently not documented). Some ExecActions that used to return Unhealthy would return Healthy, and some that used to return Unknown would return Unhealthy.

This seems reasonable but I wouldn't want to burry a change that potentially breaks the api. I'll make the change if this is not a concern.

Also for an Exec probe, there would be no way to differentiate a Failure on the part of the probe or on the part of the thing being probed. we'd just assume that it was a Failure on the thing being probed i.e. no Unknowns.

Well considered points. I am not sure I like it, but you;re right that this would be a change in semantic. Leave it as is. I'll file an issue to discuss

…obe.Failure.

mikedanese · 2015-01-27T19:30:05Z

@bgrant0607 hadn't seen that TODO. I'll be playing around with this in #3818.

thockin · 2015-01-28T06:05:00Z

Status on this? Is it ready for another final review and commit?

mikedanese · 2015-01-28T06:16:07Z

Yes, thanks. I've addressed 2/3 of your comments. If you want me to address the third (0 status code check only for exec probe) I can but I've replied why I was hesitant. Otherwise this is ready for review.

thockin · 2015-01-28T16:42:43Z

LGTM. I'm going to run e2e on this before commit.

thockin · 2015-01-28T19:00:24Z

e2e passes with the exception of PD, which has been having trouble.

@satnam6502 @brendandburns I am making a call to commit this - I don't think PD is related.

refactor pkg/health into more reusable pkg/probe

ddysher · 2015-01-29T02:02:02Z

This refactor rocks! @mikedanese FYI, I expect controller manager to also use this pkg for node health check.

mikedanese force-pushed the ready branch from a2d8863 to 7df88da Compare January 21, 2015 21:36

thockin self-assigned this Jan 22, 2015

mikedanese changed the title ~~Adds Context to pkg/health~~ WIP: refactor pkg/health to be useful in contexts other than Liveness Jan 22, 2015

mikedanese force-pushed the ready branch from 7df88da to 938b3ab Compare January 22, 2015 22:45

mikedanese force-pushed the ready branch 2 times, most recently from 3cdbb81 to 4402d46 Compare January 23, 2015 21:41

mikedanese force-pushed the ready branch 3 times, most recently from b9b0841 to 52fd703 Compare January 23, 2015 23:05

bgrant0607 reviewed Jan 24, 2015
View reviewed changes

mikedanese force-pushed the ready branch 2 times, most recently from bc3147a to 22f4207 Compare January 25, 2015 02:41

mikedanese force-pushed the ready branch from 22f4207 to cfdb2f0 Compare January 26, 2015 23:14

mikedanese changed the title ~~WIP: refactor pkg/health to be useful in contexts other than Liveness~~ WIP: refactor pkg/health into more reusable pkg/probe Jan 26, 2015

mikedanese mentioned this pull request Jan 27, 2015

break api.Probe out of api.LivenessProbe #3818

Merged

mikedanese changed the title ~~WIP: refactor pkg/health into more reusable pkg/probe~~ refactor pkg/health into more reusable pkg/probe Jan 27, 2015

thockin reviewed Jan 27, 2015
View reviewed changes

mikedanese added 3 commits January 27, 2015 11:20

remove pkg/health and move everything over to pkg/probe

a298402

add Probers to Probe pkgs.

6eb0b89

rename probe.Healthy to probe.Success and renam probe.Unhealthy to pr…

5dc6362

…obe.Failure.

mikedanese force-pushed the ready branch from a3ecee5 to 5dc6362 Compare January 27, 2015 19:20

bgrant0607 mentioned this pull request Jan 27, 2015

Sync node status from node controller to master. #3733

Merged

thockin added a commit that referenced this pull request Jan 28, 2015

Merge pull request #3695 from mikedanese/ready

c8f6188

refactor pkg/health into more reusable pkg/probe

thockin merged commit c8f6188 into kubernetes:master Jan 28, 2015

mikedanese deleted the ready branch January 28, 2015 19:01

mikedanese mentioned this pull request Jan 31, 2015

move global state of Probers in pkg/kubelet/probe.go into the kubelet struct #4002

Closed

a-robinson mentioned this pull request Apr 30, 2015

The exec LivenessProbe keys off of the output of the exec'ed command rather than its return code #7587

Closed

snyk-bot mentioned this pull request Sep 3, 2021

[Snyk] Security upgrade azure-cli from 0.9.20 to 0.10.20 UbuntuEvangelist/kubernetes#21

Open

This was referenced Apr 29, 2022

[Snyk] Security upgrade azure-cli from 0.9.20 to 0.10.20 UbuntuEvangelist/kubernetes#40

Open

[Snyk] Security upgrade azure-cli from 0.9.20 to 0.10.20 UbuntuEvangelist/kubernetes#41

Open

snyk-bot mentioned this pull request Apr 29, 2022

[Snyk] Security upgrade azure-cli from 0.9.20 to 0.10.20 Secful/kubernetes#10

Open

moshiko-salt mentioned this pull request Jun 29, 2022

[Snyk] Security upgrade azure-cli from 0.9.20 to 0.10.20 Secful/kubernetes#21

Open

This was referenced Jun 30, 2022

[Snyk] Security upgrade azure-cli from 0.9.20 to 0.10.20 UbuntuEvangelist/kubernetes#44

Open

[Snyk] Security upgrade azure-cli from 0.9.20 to 0.10.20 UbuntuEvangelist/kubernetes#45

Open

moshe010 mentioned this pull request Feb 17, 2023

Extend the PodResources API to include resources allocated by DRA #115847

Merged

This was referenced Nov 27, 2023

[Snyk] Security upgrade azure-cli from 0.9.20 to 0.10.20 UbuntuEvangelist/kubernetes#79

Open

[Snyk] Security upgrade azure-cli from 0.9.20 to 0.10.20 UbuntuEvangelist/kubernetes#80

Open

This was referenced Feb 2, 2024

[Snyk] Security upgrade azure-cli from 0.9.20 to 0.10.20 UbuntuEvangelist/kubernetes#83

Open

[Snyk] Security upgrade azure-cli from 0.9.20 to 0.10.20 UbuntuEvangelist/kubernetes#84

Open

This was referenced May 23, 2024

[Snyk] Security upgrade azure-cli from 0.9.20 to 0.10.20 UbuntuEvangelist/kubernetes#90

Open

[Snyk] Security upgrade azure-cli from 0.9.20 to 0.10.20 UbuntuEvangelist/kubernetes#91

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor pkg/health into more reusable pkg/probe #3695

refactor pkg/health into more reusable pkg/probe #3695

mikedanese commented Jan 21, 2015

thockin commented Jan 22, 2015

mikedanese commented Jan 22, 2015

thockin commented Jan 22, 2015

mikedanese commented Jan 22, 2015

bgrant0607 commented Jan 22, 2015

mikedanese commented Jan 23, 2015

thockin commented Jan 23, 2015

mikedanese commented Jan 23, 2015

mikedanese commented Jan 23, 2015

bgrant0607 Jan 24, 2015

mikedanese commented Jan 25, 2015

mikedanese commented Jan 26, 2015

thockin Jan 27, 2015

mikedanese Jan 27, 2015

mikedanese Jan 27, 2015

thockin Jan 28, 2015

mikedanese commented Jan 27, 2015

thockin commented Jan 28, 2015

mikedanese commented Jan 28, 2015

thockin commented Jan 28, 2015

thockin commented Jan 28, 2015

ddysher commented Jan 29, 2015

refactor pkg/health into more reusable pkg/probe #3695

refactor pkg/health into more reusable pkg/probe #3695

Conversation

mikedanese commented Jan 21, 2015

thockin commented Jan 22, 2015

mikedanese commented Jan 22, 2015

thockin commented Jan 22, 2015

mikedanese commented Jan 22, 2015

bgrant0607 commented Jan 22, 2015

mikedanese commented Jan 23, 2015

thockin commented Jan 23, 2015

mikedanese commented Jan 23, 2015

mikedanese commented Jan 23, 2015

bgrant0607 Jan 24, 2015

Choose a reason for hiding this comment

mikedanese commented Jan 25, 2015

mikedanese commented Jan 26, 2015

thockin Jan 27, 2015

Choose a reason for hiding this comment

mikedanese Jan 27, 2015

Choose a reason for hiding this comment

mikedanese Jan 27, 2015

Choose a reason for hiding this comment

thockin Jan 28, 2015

Choose a reason for hiding this comment

mikedanese commented Jan 27, 2015

thockin commented Jan 28, 2015

mikedanese commented Jan 28, 2015

thockin commented Jan 28, 2015

thockin commented Jan 28, 2015

ddysher commented Jan 29, 2015