Daemon Restart: attempt to wait for container deps #18208

cpuguy83 · 2015-11-24T20:39:16Z

This provides a best effort on daemon restarts to restart containers
which have linked containers that are not up yet instead of failing.

calavera · 2015-11-24T20:50:54Z

daemon/daemon.go

+					// Best effort to wait for container deps to be up
+					switch e := err.(type) {
+					case errcode.Error:
+						if e.ErrorCode() == derr.ErrorCodeLinkNotRunning {


you can also do e.ErrorCode() != derr.ErrorCodeLinkNotRunning { break } and remove the conditional a few lines below 😸

I also think we have a function to check those error codes without adding switch statements everywhere, but I cannot find it now.

@duglin ?

Also, I have the switch because I was also trying to fix another case where overlay networks aren't ready -- but there appears to be a deeper issue in libnetwork that needs to get fixed first.

I think at one point I did have a util func to avoid making everyone write switch statements but it never made the cut.

calavera · 2015-11-24T21:07:52Z

daemon/daemon.go

+						}
+						err = nil
+					}
+					if err != nil {


this is not necessary anymore 😸

calavera · 2015-11-24T21:09:32Z

nevermind about the function, I might have dreamed about it

cpuguy83 · 2015-11-24T21:52:17Z

@calavera Updated

crosbymichael · 2015-11-24T22:56:37Z

errors/error.go

 // This file contains all of the errors that can be generated from the
 // docker engine but are not tied to any specific top-level component.

 const errGroup = "engine"
+
+// IsErr returns a bool indicating if the passed in error matches the expected error
+func IsErr(err error, expected errcode.ErrorCode) bool {


this is a really bad func name if you think about it ;)

maybe something more descriptive but seeing that this is not used in this PR i would remove this file from the PR.

It is used in this PR

cpuguy83 · 2015-11-24T23:08:47Z

@crosbymichael updated the fn name.

calavera · 2015-11-24T23:49:16Z

~~LGTM~~

Edited:

I was thinking about an alternative approach:

We put the containers that fail to start in a queue instead of retrying. When all the other containers have finished, we try to restart the failed ones again sequentially. I don't know if it's better, what do you think?

crosbymichael · 2015-11-25T00:08:05Z

you can easily built a little dependency graph to fix the start order instead of thrashing the containers trying to start them.

cpuguy83 · 2015-11-25T00:53:02Z

PTAL

cpuguy83 · 2015-11-25T00:55:41Z

daemon/start.go

+	}
+
+	for _, c := range children {
+		if _, err := c.waitRunning(1 * time.Second); err != nil {


I just picked a duration here.

cpuguy83 · 2015-11-25T14:43:57Z

Going for something simple, but I think this might be too simple.
I'll have to work on something else. Closing for now.

cpuguy83 · 2015-11-25T20:29:02Z

Ok, PTAL.

crosbymichael · 2015-11-30T20:02:15Z

This is closer but there is too much concurrency where it is not needed

cpuguy83 · 2015-11-30T20:43:41Z

Ah, good point, updated to make waiting on children happen in one goroutine per container instead of for each child.

tiborvass · 2015-12-07T11:05:01Z

@cpuguy83 needs rebase

cpuguy83 · 2015-12-08T02:40:40Z

rebased

estesp · 2015-12-14T15:52:10Z

daemon/daemon.go

@@ -353,17 +353,43 @@ func (daemon *Daemon) restore() error {
 				// The container register failed should not be started.
 				return
 			}
+		}(c.container, c.registered)


Does this "upper" go routine need to exist still now that actual container start operations are in the "lower" go routine (added below) with the range restartContainers { loop?

I tried to dig through daemon.Register and daemon.generateNewName--which seem to be the only external calls in the go routine now--to see if there is any reason for this to be necessarily handled per-container in a concurrent manner and nothing jumps out at me?

Hmmm, you are probably right...
Maybe restarting the daemon after unclean shutdown would be a bit slower (making a few syscalls to cleanup)...

cpuguy83 · 2015-12-17T02:01:24Z

@estesp Updated with your suggestion.

estesp · 2015-12-17T21:26:17Z

This looks a lot better to me without two loops of go routines.. not sure if @crosbymichael wanted to take another look given the comment re: "too much concurrency"?

thaJeztah · 2015-12-20T00:25:34Z

ping @calavera @crosbymichael if you can have another look?

This provides a best effort on daemon restarts to restart containers which have linked containers that are not up yet instead of failing. Signed-off-by: Brian Goff <cpuguy83@gmail.com>

cpuguy83 · 2015-12-28T20:37:40Z

Rebased

calavera · 2015-12-29T18:05:26Z

LGTM

crosbymichael · 2016-01-04T20:27:48Z

daemon/daemon.go

+			//   running before we try to start the container
+			children, err := daemon.children(container.Name)
+			if err != nil {
+				logrus.Warnf("error getting children for %s: %v", container.Name, err)


should have a return here or you will probably get a panic from a nil container

I don't think we'll hit a panic here as we aren't doing anything with the values except checking if there if we have a start notifier for it later, which can be nil and be ok, and if children is nil, it's ok because we are just ranging over it anyway.

I'd also prefer to not return here since that would make the container never start.

ok, i see that now. thanks

crosbymichael · 2016-01-04T22:35:25Z

LGTM

Daemon Restart: attempt to wait for container deps

GordonTheTurtle added the status/0-triage label Nov 24, 2015

cpuguy83 mentioned this pull request Nov 24, 2015

container with unless-stopped restart policy is not restarted when the overlay network is not yet available #18091

Closed

calavera added status/1-design-review status/2-code-review and removed status/0-triage status/1-design-review labels Nov 24, 2015

calavera reviewed Nov 24, 2015
View reviewed changes

cpuguy83 force-pushed the restart_links branch 2 times, most recently from c3f52e1 to fa70c7f Compare November 24, 2015 21:02

calavera reviewed Nov 24, 2015
View reviewed changes

daemon/daemon.go

}

err = nil

}

if err != nil {

Copy link

Contributor

calavera Nov 24, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not necessary anymore 😸

cpuguy83 force-pushed the restart_links branch from fa70c7f to 852604b Compare November 24, 2015 21:51

cpuguy83 force-pushed the restart_links branch from 852604b to 833b149 Compare November 24, 2015 21:55

crosbymichael reviewed Nov 24, 2015
View reviewed changes

cpuguy83 force-pushed the restart_links branch from 833b149 to ce39556 Compare November 24, 2015 23:07

cpuguy83 force-pushed the restart_links branch from ce39556 to 593a44f Compare November 25, 2015 00:52

cpuguy83 reviewed Nov 25, 2015
View reviewed changes

cpuguy83 closed this Nov 25, 2015

cpuguy83 reopened this Nov 25, 2015

cpuguy83 force-pushed the restart_links branch 3 times, most recently from b562af0 to f572850 Compare November 25, 2015 18:51

thaJeztah mentioned this pull request Nov 28, 2015

Problem after reboot or hard reset: policy restart=always/on-failure with icc=false/true and linked containers #17611

Closed

cpuguy83 force-pushed the restart_links branch from f572850 to a310450 Compare November 30, 2015 20:42

cpuguy83 force-pushed the restart_links branch 2 times, most recently from 0f9315f to bd0c54b Compare December 8, 2015 01:55

estesp reviewed Dec 14, 2015
View reviewed changes

cpuguy83 force-pushed the restart_links branch from bd0c54b to 2f69ae0 Compare December 17, 2015 02:01

Daemon Restart: attempt to wait for container deps

19762da

This provides a best effort on daemon restarts to restart containers which have linked containers that are not up yet instead of failing. Signed-off-by: Brian Goff <cpuguy83@gmail.com>

cpuguy83 force-pushed the restart_links branch from 2f69ae0 to 19762da Compare December 28, 2015 16:00

crosbymichael reviewed Jan 4, 2016
View reviewed changes

crosbymichael added a commit that referenced this pull request Jan 4, 2016

Merge pull request #18208 from cpuguy83/restart_links

04234bd

Daemon Restart: attempt to wait for container deps

crosbymichael merged commit 04234bd into moby:master Jan 4, 2016

thaJeztah mentioned this pull request Jan 4, 2016

Add ability to set cgroup parent for all containers #19062

Merged

cpuguy83 deleted the restart_links branch January 5, 2016 15:56

thaJeztah added this to the 1.10.0 milestone Feb 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Daemon Restart: attempt to wait for container deps #18208

Daemon Restart: attempt to wait for container deps #18208

cpuguy83 commented Nov 24, 2015

calavera Nov 24, 2015

calavera Nov 24, 2015

cpuguy83 Nov 24, 2015

duglin Nov 24, 2015

calavera Nov 24, 2015

calavera commented Nov 24, 2015

cpuguy83 commented Nov 24, 2015

crosbymichael Nov 24, 2015

cpuguy83 Nov 24, 2015

cpuguy83 commented Nov 24, 2015

calavera commented Nov 24, 2015

crosbymichael commented Nov 25, 2015

cpuguy83 commented Nov 25, 2015

cpuguy83 Nov 25, 2015

cpuguy83 commented Nov 25, 2015

cpuguy83 commented Nov 25, 2015

crosbymichael commented Nov 30, 2015

cpuguy83 commented Nov 30, 2015

tiborvass commented Dec 7, 2015

cpuguy83 commented Dec 8, 2015

estesp Dec 14, 2015

cpuguy83 Dec 14, 2015

cpuguy83 commented Dec 17, 2015

estesp commented Dec 17, 2015

thaJeztah commented Dec 20, 2015

cpuguy83 commented Dec 28, 2015

calavera commented Dec 29, 2015

crosbymichael Jan 4, 2016

cpuguy83 Jan 4, 2016

crosbymichael Jan 4, 2016

crosbymichael commented Jan 4, 2016

Daemon Restart: attempt to wait for container deps #18208

Daemon Restart: attempt to wait for container deps #18208

Conversation

cpuguy83 commented Nov 24, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

calavera commented Nov 24, 2015

cpuguy83 commented Nov 24, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cpuguy83 commented Nov 24, 2015

calavera commented Nov 24, 2015

crosbymichael commented Nov 25, 2015

cpuguy83 commented Nov 25, 2015

Choose a reason for hiding this comment

cpuguy83 commented Nov 25, 2015

cpuguy83 commented Nov 25, 2015

crosbymichael commented Nov 30, 2015

cpuguy83 commented Nov 30, 2015

tiborvass commented Dec 7, 2015

cpuguy83 commented Dec 8, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cpuguy83 commented Dec 17, 2015

estesp commented Dec 17, 2015

thaJeztah commented Dec 20, 2015

cpuguy83 commented Dec 28, 2015

calavera commented Dec 29, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crosbymichael commented Jan 4, 2016