-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod dependencies on services #2385
Comments
cc/ @bgrant0607 @satnam6502 I totally agree with that we need a way to describe the inter-dependencies between pods and services. But I am not sure if we should introduce such dependency logical to Pod, the primitive object hosted by kubernetes. I think such dependencies should be described in config layer, and interpreted and handled by controllers. |
See also #1768 |
As a minimal thought: |
What is messy about dependency declaration? I don't doubt it is -- I just don't have any experience or insight into this area to immediately understand why. At a higher level, I think it is a good point of principle to declare dependencies somewhere so they can be used to understand the composition of the system and so this information can be used by the system and tools to enforce and maintain constraints. Right now the dependency of my Kibana pod on the Elasticsearch service is expressed deep inside the container when the environment variables |
I tentatively agree with Dawn that one should express dependencies in the config layer; the config/deployment system can wait for readiness before deploying things that depend on other things. Anyway, pods must be able to robustly handle their dependencies not being up, because this is going to happen from time to time even if you fix the startup order. |
I agree that it's nice to have dependencies in the config layer, but it sometimes leads to not handling dependencies inside a container/service. Additionally sorting out the structure of a dependency tree can sometimes be unclear. But both methods failing hard and relying on dependencies have their pros and cons. |
Daniel nails it. Sometimes things you depend on go down. You have to Tim
|
Yeah - building your services to die cleanly (or retry failed connections) is the only way out of this hole. Config can and should only hide so much. |
Also see #1899 |
But, yes, as others have mentioned, we don't depend on startup order internally, and users shouldn't depend on it, either. Until we have service DNS fully enabled, the one startup dependency we currently have is that the services needs to be created first, so that the environment variables will be present in their client's containers. |
Is it not worth making a distinction between "birth behaviour" and "running I don't strongly advocate making a distinction between bring-up behaviour Satnam On Sun, Nov 16, 2014 at 12:20 PM, bgrant0607 notifications@github.com
|
This is something we thought about internally and decided was not a great idea, despite its obviousness (it comes up again and again). I don;t think we need to revisit this any further. For the sake of the multi-hundred-item issues list, closing. |
Can you add more detail about why it's not a great idea and what the solution is for the problems it was intended to solve?
|
If there are any beyond those listed in the issue: Just want to have a good reference for future questions.
|
I suggest that marathon's app dependency and healthy checks could be for our reference. |
We've stated elsewhere that health dependencies preconditions should be
There may be others I'm missing. On Fri, Apr 8, 2016 at 2:02 AM, junneyang notifications@github.com wrote:
|
I think this should be a Helm task |
Just to add some context to this discussion (albeit very late). We have a three node cluster, with about 40 pods. They are a equal mix of java and mysql. All services are built in a resilient manner, if they can't find their dependencies, they just retry with backoff. We can stop and start the entire cluster and eventually things become accessible. The only problem is the time it takes. If we start everything up at the same time, primary services (those that others depend on) take longer to start because of the high cpu and io load on the system caused by the other services starting up at the same time and constantly retrying to talk to the primary services. If we start everything up by hand, first the primary then the secondary services, we can get the system running in half the time. So, for me, wasting resources during startup seems an unnecessary burden on the system. During run time, we face similar issues. If a primary service goes down, the health check for a secondary service will fail too, causing both primary and secondary service to restart. Ideally there should be some dependency definition that notices the primary is dead, first restart that, then recheck secondary before restarting it as soon as health on primary is back up. In the java world (spring cloud), we have a central config service, one service to rule them all... If that service dies, the cluster goes ape, restarting everything, causing many minutes of downtime, when in reality it could have just restarted the config service. I therefore support the concept of dependency management at a higher level for an effective startup sequence as well as better runtime health management. |
One thing that is a bad experience at the moment is the bring-up behaviour of one pod that depends on another the services of another pod. For example, in my logging work the Kibana viewer (pod, service) depends on the Elasticsearch (pod, service). When I try and bring them up together from my Makefile I have an intermediate sate like this for quite a while:
i.e. the Kibana viewer fails to start up because Elasticsearch is not ready yet. Eventually things start to look better:
but even though the pods are marked as Running they are still not quite ready yet and it takes another five minutes or so before one can make queries to Elasticsearch and see log output in Kibana.
It would be nice to describe in a pod declaration its dependencies on other services so this can be taken into account during scheudling. For example:
This would delay the scheduling of this pod until the pod(s) identified by the elasticsearch service are all in the running state.
The text was updated successfully, but these errors were encountered: