Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support health (readiness) checks #620

Closed
dbcode opened this issue Jul 25, 2014 · 6 comments
Closed

Support health (readiness) checks #620

dbcode opened this issue Jul 25, 2014 · 6 comments
Labels
area/api Indicates an issue on api area. area/app-lifecycle area/usability priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@dbcode
Copy link
Contributor

dbcode commented Jul 25, 2014

GCE network load balancers can be configured with health checks (periodic HTTP requests to a user-defined endpoint), such that instances are removed from the pool if they don't respond to the health checks promptly with a 200 status code.

Kubernetes should be able to reuse the same health checks, such that if a user has created a service that they wish to use from Kubernetes, their health checks will do what they expect them to do: cause any unhealthy instances to be removed from the load balancing pool until healthy again.

Ideally, if N frontends talk to M backends, this should not result in N x M health check HTTP requests per interval (i.e. each of the N frontends independently health checking each of the M backends). If that's not possible, maybe Kubernetes could transparently create and use a GCE network load balancer for each service that has more than a certain number of replicas (whether marked as "external" or not), instead of trying to do its own load balancing.

@brendandburns
Copy link
Contributor

This is already basically possible.

The kubelet implements HTTP health checks, and restarts the container if it is failing. So no task should actually be failing for very long. This means that for M backends you only do M health checks.

Taking it a step further, we could consider adding health checks to the Service polling, but in some ways that seems redundant, since only healthy tasks should be in the service pool anyway.

@dbcode
Copy link
Contributor Author

dbcode commented Jul 25, 2014

Can you clarify for me - let's say I have a cluster of 100 frontend containers using a backend service with 200 containers, and I have an HTTP health check on the backend service polling the URL "/healthy" every 5 seconds. How many requests to /healthy does each backend instance (container) see every 5 seconds?

Also, is restarting the container something that can be configured? I may not want to restart the container; e.g. on instance migration, I might want to just remove it from the LB pool 30-60 seconds before the migration takes place, and then put it back in once the migration is complete (thus minimizing broken connections).

@brendandburns
Copy link
Contributor

The current health check is a "liveness" health check, that is performed at the backend container. Thus, your 200 backends would only see one health check every 5 seconds.

It is important to note that this is not a "readiness" healthcheck which indicates that it is ready to serve. We currently don't have a notion of "readiness", but we will add it eventually. When we do, we'll implement it in the same way, so that the health check is still at the level of the backend controller, not the frontend service, so health checks are still 1-1 with the backend container.

For your second point, that's exactly the reason for differentiating between liveness and readiness.

@bgrant0607 bgrant0607 changed the title Support health checks in Kubernetes load balancing pools Support health (readiness) checks in Kubernetes load balancing pools Jul 25, 2014
@bgrant0607
Copy link
Member

There are many scenarios where it is useful to differentiate between liveness and readiness:

  • Graceful draining
  • Startup latency
  • Offline for data reloading or other maintenance

And many components (any systems that disrupt pods and/or hosts + any systems that manage sets of pods) care about readiness: rollout tools, reschedulers, kernel updaters, worker pool managers, ...

@bgrant0607
Copy link
Member

Readiness information would be useful during rolling service updates, also.

@bgrant0607 bgrant0607 changed the title Support health (readiness) checks in Kubernetes load balancing pools Support health (readiness) checks Oct 2, 2014
@bgrant0607 bgrant0607 added this to the v0.9 milestone Oct 4, 2014
@bgrant0607 bgrant0607 added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Oct 15, 2014
@bgrant0607 bgrant0607 added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Dec 3, 2014
@bgrant0607 bgrant0607 removed the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jan 9, 2015
@bgrant0607 bgrant0607 added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Jan 9, 2015
@goltermann goltermann removed this from the v0.9 milestone Feb 6, 2015
@dbcode dbcode removed this from the v0.9 milestone Feb 6, 2015
@bgrant0607
Copy link
Member

Readiness has been implemented. Yeah! Kudos to @mikedanese.

mqliang pushed a commit to mqliang/kubernetes that referenced this issue Dec 8, 2016
…-aliyun-ansible-deployment

add ansible deployment for aliyun instances to rebase 1.3.3
mqliang pushed a commit to mqliang/kubernetes that referenced this issue Mar 3, 2017
…-aliyun-ansible-deployment

add ansible deployment for aliyun instances to rebase 1.3.3
wking pushed a commit to wking/kubernetes that referenced this issue Jul 21, 2020
b3atlesfan pushed a commit to b3atlesfan/kubernetes that referenced this issue Feb 5, 2021
Allows for more flexibility when selecting the network interface that
flannel should be using.

Addresses kubernetes#620
b3atlesfan pushed a commit to b3atlesfan/kubernetes that referenced this issue Feb 5, 2021
Allows for even more flexibility when selecting the network interface
that flannel should be using

Addresses kubernetes#620
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/app-lifecycle area/usability priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

5 participants