Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor pkg/health into more reusable pkg/probe #3695

Merged
merged 5 commits into from
Jan 28, 2015

Conversation

mikedanese
Copy link
Member

This is a proposal to add a concept of Context to the HealthCheck. The Context represents the setting in which a containers health is being queried. Right now the only context used is Liveness, but it's intended that Readiness will follow shortly. The Context is used to access the correct LivenessProbe from an api.Container.

To accommodate this change, some of the HealthCheck impls have been migrated to a separate interface called ProbeRunner so they are no longer responsible for accessing the LivenessProbe directly from the api.Container.

The ultimate goal of this work is to create a mechanism for supporting readiness checks as per #620. The next steps towards this goal might be:

  • Add ReadinessProbe struct to api/types.go and break components shared with LivenessProbe into a separate HealthyProbe, add ReadinessProbe to api.Container, change pkg/health to work with HealthyProbe
  • Use ReadinessProbe checks in the kubelet to control available Endpoints.

I'll be working on the api changes next. I can either WIP this PR and append the commits or I can do that work in separate PRs. I am submitting so we can discuss whether this path is on track, or whether this is being worked on somewhere else.

@thockin
Copy link
Member

thockin commented Jan 22, 2015

I get the Check/Probe rename. I am less sure about the rest. I feel like this is forcing commonality on things that aren't really the same.

I'd rather see probes be more decoupled from kubernetes-concepts like pod. The RunInContainer could be passed in by kubelet as an implementation of util/exec.Interface or flip it around and define and receive an interface that exec implements. That wall all the pod and docker and whatever logic stays firmly in kubelet, and probes stay simple.

Then I wonder if this Context type is really worthwhile for just two instances, as opposed to just making two calls to something like RegisterLivenessProbes(NewHTTPProbe(), NewTCPProbe(), NewExecProbe()) and RegisterReadinessProbes(...). This has the side effect that the two sets of allowed probes are not necessarily the same list.

I'll chew on this a bit more, but thoughts?

@mikedanese
Copy link
Member Author

I agree with the need to decouple checks from kubernetes concepts. A "ProbeRunner" should not assume that the probe will always be accessed through a specific location in an api object, which is currently preventing the HealthCheck from being used in contexts other than Liveness. What I would like to see is a ProbeRunner interface that accepts a HealthyProbe (like a LivenessProbe but more generic) and some struct, possibly as per this TODO, that contains the information/args required to do the check.

The current implementation of pkg/health is very tightly bound to preforming HealthChecks on a single container. I wasn't sure if that was the ultimate intention but I erred on leaving that intact as much as possible while still refactoring pkg/health to be useful for contexts other than Liveness.

I also think that Context is a terrible and overloaded word but I'm struggling to find a word that fits well. I don't think that the type is necessary.

I'm still trying to figure out how best to accomplish this.

@mikedanese mikedanese changed the title Adds Context to pkg/health WIP: refactor pkg/health to be useful in contexts other than Liveness Jan 22, 2015
@thockin
Copy link
Member

thockin commented Jan 22, 2015

I am on-call today, so I won't have much time to think about this very
deeply, but I'll let it simmer and spend more time on it tomorrow or maybe
tonight. I had health-checks on my radar for fugly code cleanup anyway :)

On Thu, Jan 22, 2015 at 12:27 PM, Mike Danese notifications@github.com
wrote:

I agree with the need to decouple checks from kubernetes concepts. A
"ProbeRunner" should not assume that the probe will always be accessed
through a specific location in an api object, which is currently preventing
the HealthCheck from being used in contexts other than Liveness. What I
would like to see is a ProbeRunner interface that accepts a HealthyProbe
(like a LivenessProbe but more generic) and some struct, possibly as per
this TODO
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/api/types.go#L277,
that contains the information/args required to do the check.

The current implementation of pkg/health is very tightly bound to
preforming HealthChecks on a single container. I wasn't sure if that was
the ultimate intention but I erred on leaving that intact as much as
possible while still refactoring pkg/health to be useful for contexts other
than Liveness.

I also think that Context is a terrible and overloaded word but I'm
struggling to find a word that fits well. I don't think that the type is
necessary.

I'm still trying to figure out how best to accomplish this.

Reply to this email directly or view it on GitHub
#3695 (comment)
.

@mikedanese
Copy link
Member Author

No worries, take your time. I've tried factoring the api objects out of the pkg/health/health.go interfaces where the client passes in a Probe and a "context" (unrelated to my previous use of the word context) required to execute the Probe. This has the benefit of decoupling api types from the interfaces, isolates most of the mess in the PodeRunner implementations, and would allow a client to pass in HealthyProbes in settings other than Liveness.

@bgrant0607
Copy link
Member

Ooh, I'd love to have readiness checks. I'll take a look.

FYI, something similar also applies to nodes, though in that case we've focused on reporting the "condition" rather than specifying how it is determined:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/api/v1beta1/types.go#L551

@mikedanese
Copy link
Member Author

It does seem similar although NodeConditionKind is not mutually exclusive. ProbeRunner could probably be adapted to work for both uses, but I wouldn't want to shoehorn it. I'll think about whether it's possible to reconcile the two. It would be nice if Node checks used pkg/health.

@thockin
Copy link
Member

thockin commented Jan 23, 2015

I spent a little time looking at the existing health check code and I feel
like it is inside out. It is trying to be plugin-ish, but it really isn't
the sort of thing I would expect to see plugins for, and it's not
abstracted enough to operate that way.

What if we backed down on the plugin-style here?

Nobody but kubelet uses pkg/health EXCEPT for client/kubelet and a few
places that use health.Status. pkg/health understand Container and Pod
concepts, but why should it? I think it would be simpler to make
pkg/health become pkg/probe, with sub-packages for exec, tcp, http. The
probe pkg would have the status codes and anything common. The individual
sub-pkgs would focus on tight, simple probe implementations with pretty
consistent interfaces (as much as possible, but not artificially so) and NO
pod or container specific tie-ins.

For example, the pkg/probe/exec.Probe type would have a method Run(exec
utilexec.Interface) (or something like that). All of the RunInContainer
goop would move to Kubelet. Then, anyone who wanted to do a probe
operation (and there are a couple that are currently open coded to
net/http) would be able to create a pkg/probe/{kind}.Probe() and call Run()
on it.

pkg/probe would NOT treat probes as opaque plugins. To do that requires a
structure that can be probed against, which requires all probes to be
available everywhere and that just doesn't make sense. Maybe kubelet can
do that, but it exists within kubelet only. Simpler might be to just code
it out directly. Something like:

func (kl *Kubelet) runProbe(probe *api.Probe) health.Status {
    if probe.Exec != nil {
        return allProbes["exec"].Probe(newExecInContainer(pod.UID,
pod.Spec.Containers[c]))
    }
    if probe.HTTP != nil {
        return allProbes["http"].Probe(pod.Status.PodIP, probe.HTTP.URL)
    }
    if probe.TCP != nil {
        return allProbes["tcp"].Probe(pod.Status.PodIP, probe.TCP.Port)
    }
}

Then you can use the same structure for both liveness and readiness, and
the same code to process it, but that logic lives inside Kubelet.

I've rambled quite a bit, so I'll hand the mic back to you :).  I see this
as a few days to a week of work, probably.  I'm not going to force that on
you just to get readiness done, but the more we pile stuff on top of the
existing mess, the harder it will be to disentangle.

Tim

On Thu, Jan 22, 2015 at 4:44 PM, Mike Danese <notifications@github.com>
wrote:

> It does seem similar although NodeConditionKind is not mutually exclusive.
> ProbeRunner could probably be adapted to work for both uses, but I wouldn't
> want to shoehorn it. I'll think about whether it's possible to reconcile
> the two.
>
> --
> Reply to this email directly or view it on GitHub
> <https://github.com/GoogleCloudPlatform/kubernetes/pull/3695#issuecomment-71128910>
> .
>

@mikedanese
Copy link
Member Author

I agree that pkg/health should be rethought. Readiness is something I'll be working towards over the next couple weeks but it's low enough priority that I wouldn't put off doing any cleanup I can along the way. I'll think about your suggestions and work on a first pass at this today.

@mikedanese mikedanese force-pushed the ready branch 2 times, most recently from 3cdbb81 to 4402d46 Compare January 23, 2015 21:41
@mikedanese
Copy link
Member Author

@thockin I've taken a first pass at what I think a pkg/probe could look like and I've tried to address your concerns. PTAL and let me know if I'm on track. I can add tests and move things over that depend on pkg/health if you prefer this.

@mikedanese mikedanese force-pushed the ready branch 3 times, most recently from b9b0841 to 52fd703 Compare January 23, 2015 23:05
"github.com/golang/glog"
)

func (kl *Kubelet) makeLivenessProbeRunner(podFullName string, podUID types.UID, status api.PodStatus, container api.Container) probe.ProbeRunner {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just pass in LivenessProbe and any other information from the container needed (e.g., name). That would make it easier to reuse this for readiness probes.

We should change the name of type LivenessProbe to Probe.

@mikedanese mikedanese force-pushed the ready branch 2 times, most recently from bc3147a to 22f4207 Compare January 25, 2015 02:41
@mikedanese
Copy link
Member Author

@bgrant0607 rather than renaming LivenessProbe, we could break the Actions into a Probe type and maintain {Context}Probes that hold a Probe and context specific settings. e.g.

type Probe struct {
    HTTPGet   *HTTPGetAction   `json:"httpGet,omitempty"`
    TCPSocket *TCPSocketAction `json:"tcpSocket,omitempty"`
    Exec      *ExecAction      `json:"exec,omitempty"`
}

type LivenessProbe struct {
    Probe               `json:",inline"`
    InitialDelaySeconds int64 `json:"initialDelaySeconds,omitempty"`
}

type ReadinessProbe struct {
    Probe              `json:",inline"`
    TimeoutSeconds     int64 `json:"timeoutSeconds,omitempty"`
    FrequencySeconds   int64 `json:"frequencySeconds,omitempty"`
    UnhealthyThreshold int64 `json:"unhealthyThreshold,omitempty"`
    HealthyThreshold   int64 `json:"healthyThreshold,omitempty"`
}

I'm not sure if this is over thinking it as most settings could make sense for most contexts, but I can see a few benefits to this approach. We could add configs to ContextProbes as they become implemented (i.e. we wouldn't have config fields that do things in some spots and are unimplemented in others). We could also have Context specific settings. We could also rename LivenessProbe to Probe and break out ContextProbes later on if needed. Either way, I'm happy to work on these api changes. I will probably do this in a separate PR as this one is already looking large.

@mikedanese mikedanese changed the title WIP: refactor pkg/health to be useful in contexts other than Liveness WIP: refactor pkg/health into more reusable pkg/probe Jan 26, 2015
@mikedanese
Copy link
Member Author

@thockin code and tests are moved over from pkg/health. Sorry about the size. I tried to break it up logically by commit. PTAL.

@mikedanese mikedanese changed the title WIP: refactor pkg/health into more reusable pkg/probe refactor pkg/health into more reusable pkg/probe Jan 27, 2015
if err != nil {
return probe.Unknown, err
}
if strings.ToLower(string(data)) != defaultHealthyOutput {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we just be checking for a zero exit status?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern with this is that we would be changing the behavior of the LivenessProbe api (although it's currently not documented). Some ExecActions that used to return Unhealthy would return Healthy, and some that used to return Unknown would return Unhealthy.

This seems reasonable but I wouldn't want to burry a change that potentially breaks the api. I'll make the change if this is not a concern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for an Exec probe, there would be no way to differentiate a Failure on the part of the probe or on the part of the thing being probed. we'd just assume that it was a Failure on the thing being probed i.e. no Unknowns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well considered points. I am not sure I like it, but you;re right that this would be a change in semantic. Leave it as is. I'll file an issue to discuss

@mikedanese
Copy link
Member Author

@bgrant0607 hadn't seen that TODO. I'll be playing around with this in #3818.

@thockin
Copy link
Member

thockin commented Jan 28, 2015

Status on this? Is it ready for another final review and commit?

@mikedanese
Copy link
Member Author

Yes, thanks. I've addressed 2/3 of your comments. If you want me to address the third (0 status code check only for exec probe) I can but I've replied why I was hesitant. Otherwise this is ready for review.

@thockin
Copy link
Member

thockin commented Jan 28, 2015

LGTM. I'm going to run e2e on this before commit.

@thockin
Copy link
Member

thockin commented Jan 28, 2015

e2e passes with the exception of PD, which has been having trouble.

@satnam6502 @brendandburns I am making a call to commit this - I don't think PD is related.

thockin added a commit that referenced this pull request Jan 28, 2015
refactor pkg/health into more reusable pkg/probe
@thockin thockin merged commit c8f6188 into kubernetes:master Jan 28, 2015
@mikedanese mikedanese deleted the ready branch January 28, 2015 19:01
@ddysher
Copy link
Contributor

ddysher commented Jan 29, 2015

This refactor rocks! @mikedanese FYI, I expect controller manager to also use this pkg for node health check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants