Pod dependencies on services #2385

satnam6502 · 2014-11-14T20:33:03Z

One thing that is a bad experience at the moment is the bring-up behaviour of one pod that depends on another the services of another pod. For example, in my logging work the Kibana viewer (pod, service) depends on the Elasticsearch (pod, service). When I try and bring them up together from my Makefile I have an intermediate sate like this for quite a while:

NAME                           IMAGE(S)                                                                            HOST                                                           LABELS                      STATUS
influx-grafana                 kubernetes/heapster_influxdb,kubernetes/heapster_grafana,dockerfile/elasticsearch   kubernetes-minion-3.c.kubernetes-elk.internal/146.148.76.82    name=influxdb               Pending
heapster                       kubernetes/heapster                                                                 kubernetes-minion-1.c.kubernetes-elk.internal/130.211.126.68   name=heapster               Running
synthetic-logger-0.25lps-pod   ubuntu:14.04                                                                        kubernetes-minion-1.c.kubernetes-elk.internal/130.211.126.68   name=synth-logging-source   Running
elasticsearch-pod              dockerfile/elasticsearch                                                            kubernetes-minion-2.c.kubernetes-elk.internal/23.236.59.213    app=elasticsearch           Pending
kibana-pod                     kubernetes/kibana:latest                                                            kubernetes-minion-4.c.kubernetes-elk.internal/130.211.121.21   app=kibana-viewer           Failed

i.e. the Kibana viewer fails to start up because Elasticsearch is not ready yet. Eventually things start to look better:

NAME                           IMAGE(S)                                                                            HOST                                                           LABELS                      STATUS
influx-grafana                 kubernetes/heapster_influxdb,kubernetes/heapster_grafana,dockerfile/elasticsearch   kubernetes-minion-3.c.kubernetes-elk.internal/146.148.76.82    name=influxdb               Pending
heapster                       kubernetes/heapster                                                                 kubernetes-minion-1.c.kubernetes-elk.internal/130.211.126.68   name=heapster               Running
synthetic-logger-0.25lps-pod   ubuntu:14.04                                                                        kubernetes-minion-1.c.kubernetes-elk.internal/130.211.126.68   name=synth-logging-source   Running
elasticsearch-pod              dockerfile/elasticsearch                                                            kubernetes-minion-2.c.kubernetes-elk.internal/23.236.59.213    app=elasticsearch           Running
kibana-pod                     kubernetes/kibana:latest                                                            kubernetes-minion-4.c.kubernetes-elk.internal/130.211.121.21   app=kibana-viewer           Running
kubectl.sh get services

but even though the pods are marked as Running they are still not quite ready yet and it takes another five minutes or so before one can make queries to Elasticsearch and see log output in Kibana.

It would be nice to describe in a pod declaration its dependencies on other services so this can be taken into account during scheudling. For example:

apiVersion: v1beta1
kind: Pod
id: kibana-pod
desiredState:
  manifest:
    version: v1beta1
    id: kibana-server
    containers:
      - name: kibana-image
        image: kubernetes/kibana:latest
        ports:
          - name: kibana-port
            containerPort: 80
        dependencies: [elasticsearch]
labels:
  app: kibana-viewer

This would delay the scheduling of this pod until the pod(s) identified by the elasticsearch service are all in the running state.

The text was updated successfully, but these errors were encountered:

dchen1107 · 2014-11-14T21:26:08Z

cc/ @bgrant0607

@satnam6502 I totally agree with that we need a way to describe the inter-dependencies between pods and services. But I am not sure if we should introduce such dependency logical to Pod, the primitive object hosted by kubernetes. I think such dependencies should be described in config layer, and interpreted and handled by controllers.

bgrant0607 · 2014-11-14T21:53:23Z

See also #1768

stp-ip · 2014-11-14T22:57:10Z

As a minimal thought:
Dependecy declaration can be quite messy. Building on top of dependencies being taken into account will make things messier than making containers fail until services are available.

satnam6502 · 2014-11-14T23:04:16Z

What is messy about dependency declaration? I don't doubt it is -- I just don't have any experience or insight into this area to immediately understand why.

At a higher level, I think it is a good point of principle to declare dependencies somewhere so they can be used to understand the composition of the system and so this information can be used by the system and tools to enforce and maintain constraints. Right now the dependency of my Kibana pod on the Elasticsearch service is expressed deep inside the container when the environment variables ELASTICSEARCH_SERVICE_{HOST|PORT} are used.

lavalamp · 2014-11-14T23:14:36Z

I tentatively agree with Dawn that one should express dependencies in the config layer; the config/deployment system can wait for readiness before deploying things that depend on other things.

Anyway, pods must be able to robustly handle their dependencies not being up, because this is going to happen from time to time even if you fix the startup order.

stp-ip · 2014-11-15T01:11:40Z

I agree that it's nice to have dependencies in the config layer, but it sometimes leads to not handling dependencies inside a container/service. Additionally sorting out the structure of a dependency tree can sometimes be unclear. But both methods failing hard and relying on dependencies have their pros and cons.

thockin · 2014-11-15T03:44:04Z

Daniel nails it. Sometimes things you depend on go down. You have to
handle that. Startup deps are a crutch that makes you think you are
resilient to failure until suddenly a real failure occurs and you are not.

Tim
On Nov 14, 2014 3:14 PM, "Daniel Smith" notifications@github.com wrote:

I tentatively agree with Dawn that one should express dependencies in the
config layer; the config/deployment system can wait for readiness before
deploying things that depend on other things.

Anyway, pods must be able to robustly handle their dependencies not being
up, because this is going to happen from time to time even if you fix the
startup order.

Reply to this email directly or view it on GitHub
#2385 (comment)
.

smarterclayton · 2014-11-16T00:54:55Z

Yeah - building your services to die cleanly (or retry failed connections) is the only way out of this hole. Config can and should only hide so much.

bgrant0607 · 2014-11-16T20:16:02Z

Also see #1899

bgrant0607 · 2014-11-16T20:19:56Z

But, yes, as others have mentioned, we don't depend on startup order internally, and users shouldn't depend on it, either.

Until we have service DNS fully enabled, the one startup dependency we currently have is that the services needs to be created first, so that the environment variables will be present in their client's containers.

satnam6502 · 2014-11-16T20:26:49Z

Is it not worth making a distinction between "birth behaviour" and "running
as per usual" behaviour? Of course, sometimes things you depend on are
unreliable and you need to be robust against that. However, when you know
that something will fail e.g. when giving birth to pods, why go ahead with
it? In the usual mode of operation you don't know when a failure will
occur. If you know that pod A depends on service S which has zero pods up
to implement its interface then you know that attempting to schedule A will
fail -- so why waste steam doing that?

I don't strongly advocate making a distinction between bring-up behaviour
and "regular" behaviour -- but I think it is something we ought to think
about in order to improve the experience for our users.

Satnam

On Sun, Nov 16, 2014 at 12:20 PM, bgrant0607 notifications@github.com
wrote:

But, yes, as others have mentioned, we don't depend on startup order
internally, and users shouldn't depend on it, either.

Until we have service DNS fully enabled, the one startup dependency we
currently have is that the services needs to be created first, so that the
environment variables will be present in their client's containers.

—
Reply to this email directly or view it on GitHub
#2385 (comment)
.

thockin · 2014-11-18T05:42:43Z

This is something we thought about internally and decided was not a great idea, despite its obviousness (it comes up again and again). I don;t think we need to revisit this any further.

For the sake of the multi-hundred-item issues list, closing.

smarterclayton · 2014-11-18T13:15:40Z

Can you add more detail about why it's not a great idea and what the solution is for the problems it was intended to solve?

On Nov 18, 2014, at 12:42 AM, Tim Hockin notifications@github.com wrote:

This is something we thought about internally and decided was not a great idea, despite its obviousness (it comes up again and again). I don;t think we need to revisit this any further.

For the sake of the multi-hundred-item issues list, closing.

—
Reply to this email directly or view it on GitHub.

smarterclayton · 2014-11-18T13:17:58Z

If there are any beyond those listed in the issue: Just want to have a good reference for future questions.

On Nov 18, 2014, at 12:42 AM, Tim Hockin notifications@github.com wrote:

This is something we thought about internally and decided was not a great idea, despite its obviousness (it comes up again and again). I don;t think we need to revisit this any further.

For the sake of the multi-hundred-item issues list, closing.

—
Reply to this email directly or view it on GitHub.

junneyang · 2016-04-08T06:01:49Z

I suggest that marathon's app dependency and healthy checks could be for our reference.
https://mesosphere.github.io/marathon/
https://mesosphere.github.io/marathon/docs/generated/api.html#v2_apps_post

smarterclayton · 2016-04-08T14:37:15Z

We've stated elsewhere that health dependencies preconditions should be
handled in an init container (#1589). If we do that, we need to determine
what use case the dependencies enable. We have the following:

As input to higher level controllers / UIs that can assist users
(automatically generated or manual)
To parameterize the lookup of the dependency - should be env var
replacement and DNS
To get at the service VIP - should be DNS
To get access to other service info, like a public hostname, or an
external IP, or the cluster load balancer IP - we've debated whether this
should be DNS

There may be others I'm missing.

On Fri, Apr 8, 2016 at 2:02 AM, junneyang notifications@github.com wrote:

I suggest that marathon's app dependency and healthy checks could be for
our reference.
https://mesosphere.github.io/marathon/
https://mesosphere.github.io/marathon/docs/generated/api.html#v2_apps_post

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#2385 (comment)

pbarker · 2016-11-30T22:06:23Z

I think this should be a Helm task

johanvdb · 2018-05-24T06:56:50Z

Just to add some context to this discussion (albeit very late). We have a three node cluster, with about 40 pods. They are a equal mix of java and mysql. All services are built in a resilient manner, if they can't find their dependencies, they just retry with backoff. We can stop and start the entire cluster and eventually things become accessible. The only problem is the time it takes. If we start everything up at the same time, primary services (those that others depend on) take longer to start because of the high cpu and io load on the system caused by the other services starting up at the same time and constantly retrying to talk to the primary services. If we start everything up by hand, first the primary then the secondary services, we can get the system running in half the time.

So, for me, wasting resources during startup seems an unnecessary burden on the system. During run time, we face similar issues. If a primary service goes down, the health check for a secondary service will fail too, causing both primary and secondary service to restart. Ideally there should be some dependency definition that notices the primary is dead, first restart that, then recheck secondary before restarting it as soon as health on primary is back up.

In the java world (spring cloud), we have a central config service, one service to rule them all... If that service dies, the cluster goes ape, restarting everything, causing many minutes of downtime, when in reality it could have just restarted the config service.

I therefore support the concept of dependency management at a higher level for an effective startup sequence as well as better runtime health management.

dchen1107 added kind/design Categorizes issue or PR as related to design. kind/enhancement labels Nov 14, 2014

thockin closed this as completed Nov 18, 2014

bgrant0607 mentioned this issue Jan 14, 2015

Pods need to pre-declare service links iff they want the environment variables created #1768

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod dependencies on services #2385

Pod dependencies on services #2385

satnam6502 commented Nov 14, 2014

dchen1107 commented Nov 14, 2014

bgrant0607 commented Nov 14, 2014

stp-ip commented Nov 14, 2014

satnam6502 commented Nov 14, 2014

lavalamp commented Nov 14, 2014

stp-ip commented Nov 15, 2014

thockin commented Nov 15, 2014

smarterclayton commented Nov 16, 2014

bgrant0607 commented Nov 16, 2014

bgrant0607 commented Nov 16, 2014

satnam6502 commented Nov 16, 2014

thockin commented Nov 18, 2014

smarterclayton commented Nov 18, 2014

smarterclayton commented Nov 18, 2014

junneyang commented Apr 8, 2016

smarterclayton commented Apr 8, 2016

pbarker commented Nov 30, 2016

johanvdb commented May 24, 2018

Pod dependencies on services #2385

Pod dependencies on services #2385

Comments

satnam6502 commented Nov 14, 2014

dchen1107 commented Nov 14, 2014

bgrant0607 commented Nov 14, 2014

stp-ip commented Nov 14, 2014

satnam6502 commented Nov 14, 2014

lavalamp commented Nov 14, 2014

stp-ip commented Nov 15, 2014

thockin commented Nov 15, 2014

smarterclayton commented Nov 16, 2014

bgrant0607 commented Nov 16, 2014

bgrant0607 commented Nov 16, 2014

satnam6502 commented Nov 16, 2014

thockin commented Nov 18, 2014

smarterclayton commented Nov 18, 2014

smarterclayton commented Nov 18, 2014

junneyang commented Apr 8, 2016

smarterclayton commented Apr 8, 2016

pbarker commented Nov 30, 2016

johanvdb commented May 24, 2018