-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support health (readiness) checks #620
Comments
This is already basically possible. The kubelet implements HTTP health checks, and restarts the container if it is failing. So no task should actually be failing for very long. This means that for M backends you only do M health checks. Taking it a step further, we could consider adding health checks to the Service polling, but in some ways that seems redundant, since only healthy tasks should be in the service pool anyway. |
Can you clarify for me - let's say I have a cluster of 100 frontend containers using a backend service with 200 containers, and I have an HTTP health check on the backend service polling the URL "/healthy" every 5 seconds. How many requests to /healthy does each backend instance (container) see every 5 seconds? Also, is restarting the container something that can be configured? I may not want to restart the container; e.g. on instance migration, I might want to just remove it from the LB pool 30-60 seconds before the migration takes place, and then put it back in once the migration is complete (thus minimizing broken connections). |
The current health check is a "liveness" health check, that is performed at the backend container. Thus, your 200 backends would only see one health check every 5 seconds. It is important to note that this is not a "readiness" healthcheck which indicates that it is ready to serve. We currently don't have a notion of "readiness", but we will add it eventually. When we do, we'll implement it in the same way, so that the health check is still at the level of the backend controller, not the frontend service, so health checks are still 1-1 with the backend container. For your second point, that's exactly the reason for differentiating between liveness and readiness. |
There are many scenarios where it is useful to differentiate between liveness and readiness:
And many components (any systems that disrupt pods and/or hosts + any systems that manage sets of pods) care about readiness: rollout tools, reschedulers, kernel updaters, worker pool managers, ... |
Readiness information would be useful during rolling service updates, also. |
Readiness has been implemented. Yeah! Kudos to @mikedanese. |
…-aliyun-ansible-deployment add ansible deployment for aliyun instances to rebase 1.3.3
…-aliyun-ansible-deployment add ansible deployment for aliyun instances to rebase 1.3.3
Kubectl Book Final Edits
Allows for more flexibility when selecting the network interface that flannel should be using. Addresses kubernetes#620
Allows for even more flexibility when selecting the network interface that flannel should be using Addresses kubernetes#620
GCE network load balancers can be configured with health checks (periodic HTTP requests to a user-defined endpoint), such that instances are removed from the pool if they don't respond to the health checks promptly with a 200 status code.
Kubernetes should be able to reuse the same health checks, such that if a user has created a service that they wish to use from Kubernetes, their health checks will do what they expect them to do: cause any unhealthy instances to be removed from the load balancing pool until healthy again.
Ideally, if N frontends talk to M backends, this should not result in N x M health check HTTP requests per interval (i.e. each of the N frontends independently health checking each of the M backends). If that's not possible, maybe Kubernetes could transparently create and use a GCE network load balancer for each service that has more than a certain number of replicas (whether marked as "external" or not), instead of trying to do its own load balancing.
The text was updated successfully, but these errors were encountered: