Fix plugin lock when restoring containers #25005

ohmystack · 2016-07-25T13:38:34Z

The problem occurs when starting dockerd with the containers using the
same unavailable plugin.
The containers are restored one-by-one, not asynchronous.
The dockerd needs to wait (N)*(8 secs) for restoring all the containers.
(N is the number of containers using the unavailable plugin.)

How to reproduce

Precondition: I have a rbd-docker-plugin, with a plugin name "rbd")

Step 1. Run the rbd-docker-plugin and the dockerd.
Step 2. Create 2 containers using plugin rbd.

docker create --name cont01 --volume-driver=rbd -v docker_volumes/web1:/aaa ubuntu:14.04     sleep 5
docker create --name cont02 --volume-driver=rbd -v docker_volumes/web1:/aaa ubuntu:14.04 sleep 5

Step 3. Stop rbd-docker-plugin.
Step 4. Restart dockerd. (Now dockerd is unable to connect plugin rbd)

Then, you will find the dockerd logs like this:

WARN[0001] Unable to locate plugin: rbd, retrying in 1s
WARN[0002] Unable to locate plugin: rbd, retrying in 2s
WARN[0004] Unable to locate plugin: rbd, retrying in 4s
WARN[0008] Unable to locate plugin: rbd, retrying in 8s
ERRO[0016] get docker_volumes/web1: Error looking up volume plugin rbd: plugin not found
WARN[0016] Unable to locate plugin: rbd, retrying in 1s
WARN[0017] Unable to locate plugin: rbd, retrying in 2s
WARN[0019] Unable to locate plugin: rbd, retrying in 4s
WARN[0023] Unable to locate plugin: rbd, retrying in 8s
ERRO[0031] get docker_volumes/web1: Error looking up volume plugin rbd: plugin not found

Totally use 2 * 16 seconds = 32 secs.

Should be

Every container, who waits for the plugin, should start to wait at the same time, like this,

WARN[0001] Unable to locate plugin: rbd, retrying in 1s
WARN[0002] Unable to locate plugin: rbd, retrying in 2s
WARN[0004] Unable to locate plugin: rbd, retrying in 4s
WARN[0008] Unable to locate plugin: rbd, retrying in 8s
ERRO[0016] get docker_volumes/web1: Error looking up volume plugin rbd: plugin not found
ERRO[0016] get docker_volumes/web1: Error looking up volume plugin rbd: plugin not found

Totally use 16 seconds.

The cause

Though daemon.prepareMountPoints(c) runs in seperated goroutines,
containers are still initialized synchronously because it is blocked at
"GetWithRef".

When multiple goroutines call drivers.lookup for the same volume concurrently,

There are 2 steps in lookup to initialize a volume driver.
- Step1. Get the plugin by plugin.LookupWithCapability(name, extName), put plugin into storage.plugins[name]
- Step2. Put plugin into drivers.extensions[name]
The wait loop (waits 16s) is in Step1. lookup holds a lock (for Step1 + Step2) at the beginning, which is before Step1's lock. So that the 2nd goroutine's Step1 cannot go into its wait loop, Every goroutine's Step1 becomes synchronous.
- When the 2nd goroutine's Step1 can run, it doesn't know this plugin with this "name" has failed before (in 1st goroutine)
- The 2nd goroutine waits again, waits another 16s.
- Totally, it becomes (N)*(16s).

Here's the fixes

The 1st goroutine holds the lock
The 2nd, 3rd, ..., goroutines wait for the lock
If the 1st goroutine's Step1 + Step2 fails, it tells 2nd, 3rd, ..., other goroutines to return error, not to do Step1 + Step2 again.

~~Only after getting the volume driver successfully should the volume be locked by name.~~

There is no need to lock drivers when "lookup" a driver. Because
"lookup" actually does a GetOrCreate thing. When it is doing
(pkg/plugins/plugins.go)get(name string), there is already a lock to
ensure the same plugin won't be registered twice. So we can remove
the outer lock in "lookup". Nor, it will block the "prepareMountPoints".

ohmystack · 2016-07-25T14:04:51Z

@cpuguy83 Would you mind reviewing?

cpuguy83 · 2016-07-25T14:09:29Z

@ohmystack I'll have to re-review the whole system. These locks are there for a reason, so just removing them seems dangerous without careful review.

Also, not sure how this would change the sequential-ness since it still needs to wait while doing the lookup. It should be you've just chagned where it waits at.... but again I'll have to go through the whole system.

Also ping @anusha-ragunathan

anusha-ragunathan · 2016-07-26T02:57:42Z

lookup() uses two types of locks:

driverLock is of type locker.Locker, a granular lock which blocks any other access to the given driver "name".
drivers.driverLock.Lock(name)
defer drivers.driverLock.Unlock(name)
drivers.Lock() protects the entire "extensions" map. It's a global lock. Any access to the extensions map should use this global lock.
drivers.Lock()
ext, ok := drivers.extensions[name]
drivers.Unlock()

Both are necessary to prevent races. Refer to https://github.com/docker/docker/blob/3d13fddd2bc4d679f0eaa68b0be877e5a816ad53/pkg/locker/README.md for more info.

ohmystack · 2016-07-27T03:19:44Z

@anusha-ragunathan Agree. drivers.driverLock.Lock(name) protects the "GetOrCreate" action for specific drivers.extensions[name].

After looking carefully, I find the problem is that:

There are 2 steps in lookup to initialize a volume driver.
- Step1. Get the plugin by plugin.LookupWithCapability(name, extName), put plugin into storage.plugins[name]
- Step2. Put plugin into drivers.extensions[name]
The wait loop (waits 8s) is in Step1. lookup holds a lock at the beginning to prevent Step2 races, however, this lock is before Step1. So, the 2nd goroutine's Step1 cannot go into its wait loop, Step1 becomes synchronous.
- When the 2nd goroutine's Step1 can run, it doesn't know this plugin with this "name" has failed before (in 1st goroutine)
- The 2nd goroutine waits again, waits another 8s.
- Totally, it becomes (N)*(8s).

So, the fix can be:

The 1st goroutine holds the lock
The 2nd, 3rd, ..., goroutines wait for the lock
If the 1st goroutine's Step1 fails, it tells 2nd, 3rd, ..., other goroutines to return error, not to do Step1 again.

Do you agree with this logic?
I'll update a commit soon.

ohmystack · 2016-07-28T14:19:27Z

@cpuguy83 @anusha-ragunathan

I reverted the change in my old commit.
Add an important "SetError" method to pkg/locker/locker.go, which can tell other goroutines, who are waiting for the lock and want to do the same thing, to give up the lock and not to repeat the stupid error again.

How do you think about this change? (If ok, I will do a squash.)

ohmystack · 2016-07-29T02:37:36Z

The method name, maybe CancelWithError is better name than SetError?

CancelWithError indicates that other goroutines who are waiting for the lock will get an error, instead of getting the lock. Their locker.Lock(name) is canceled by this caller.

ohmystack · 2016-07-29T05:06:06Z

CC @thaJeztah

ohmystack · 2016-08-01T09:25:59Z

From Jenkins logs, 2 tests failed because No space left on device of test machines... 😓
I'll retrigger the tests later.

cpuguy83 · 2016-08-01T14:34:40Z

pkg/locker/locker.go

@@ -90,6 +93,13 @@ func (l *Locker) Lock(name string) {
 	// once locked then we can decrement the number of waiters for this lock
 	nameLock.Lock()


Isn't this where it's going to block at?

I think we'll need to consider a different implementation, possibly a condition variable that can broadcast to all waiters on error or signal a single waiter when the lock is ready.

Yes, @cpuguy83 . All goroutines block here, but only one can do CancelWithError, this goroutine will unlock the lock. After that, one of the other goroutines will continue running from this line, and it checks the error, give up the lock and returns the error, just like the lock is canceled. Though the waiters are canceled one by one, it is very quick because the judgement logic is inside of locker.Locker("name").

Another way, like you said, I've thought about using sync.Cond to replace the sync.Mutex.

locker.Lock("name") if locker.IsFirstOneGetTheLock("name") { // Maybe mix sync.Once to do this // do the stuff, get the plugin (wait 16s), get err if err != nil { locker.SetError("name", err) } locker.Broadcast("name") } else { for condition() { // How to define the condition here? locker.Wait("name") } locker.Unlock("name) }

Its advantage is that pkg/locker looks much more like a "Lock" package. However, it is more complicated and harder to be used.

No need to expose the condition variable as the API, only need to use it internally instead of using a literal sync.Mutex.

I got it. But I have a concern that if we put the "only do once during a locker[name]'s lifecycle" logic inside the pkg/locker, the package maybe not so universal.

cpuguy83 · 2016-08-01T14:38:09Z

The behavior still feels a little strange, but I see what it's trying to accomplish.

@anusha-ragunathan This shouldn't be a problem at all with the new plugin arch, will it?

ohmystack · 2016-08-01T17:10:59Z

@cpuguy83 @anusha-ragunathan I think the new one will be fine.

In the new plugin arch,

Step1 and Step2 are divided. This is good. cmd/dockerd/daemon.go does pluginInit before any other containers, so that containers won't have to worry about the starting order and to wait 16s.
The get in new plugin/manager.go only does the "get" action, the lock is very quick. Old plugin arch tries to load in the end of get, and the load takes a 16s backoff.

HOWEVER, if the old plugin runs in the new arch without plugins.json or they are not in plugins.json, problem is still there. The daemon will still be block at daemon.prepareMountPoints(c).

I have build with DOCKER_EXPERIMENTAL=1 and test it.

anusha-ragunathan · 2016-08-01T18:56:35Z

volume/drivers/extpoint.go

-	drivers.driverLock.Lock(name)
+	if err := drivers.driverLock.Lock(name); err != nil {
+		return nil, fmt.Errorf("Error looking up volume plugin %s: %v", name, err)
+	}
 	defer drivers.driverLock.Unlock(name)


Lock() failure already Unlocks. Calling Unlock on defer on an already Unlocked lock will cause runtime errors.

But the defer is after the if err.
If there is an error, it returns immediately at "line 96". No chance to run "line 98" defer Unlock.

My bad, what you have is fine. I'm used to seeing this format:
err := Lock()
defer UnLock()
if err != nil {
// handle error
}

anusha-ragunathan · 2016-08-01T19:21:39Z

New plugin arch doesnt have expo backoff retry logic like legacy. All plugins are started alongside the daemon. Old plugins cannot run using the new infra. They need to be rewritten.

If a goroutine holds a lock for a long time to do a job and it gets an error, this goroutine should tell others, who are also waiting for this lock to do the same job, that this job will return error, don't waste your time again. This happens when looking for an volume plugin. Everytime dockerd waits an 16s without knowing the same volume/plugin has already failed before. CancelWithError indicates that other goroutines who are waiting for the lock will get an error, instead of getting the lock. Their locker.Lock(name) is canceled by this caller. Signed-off-by: ohmystack <jun.jiang02@ele.me>

ohmystack · 2016-08-02T04:45:14Z

I did a squash, and re-triggered the tests.

ohmystack · 2016-08-02T04:50:57Z

According to #23446

Note: backwards compatibility with existing plugins is maintained

Though new arch is fine, this fix is still needed by legacy volume plugins.

anusha-ragunathan · 2016-08-02T14:54:36Z

@ohmystack : What "backwards compatibility with existing plugins is maintained" means is that, new plugin arch will not invalidate legacy plugins. Legacy plugins will continue to work in parallel with the new plugins.

ohmystack · 2016-08-02T15:09:17Z

I see. How about I say it in this way, the new plugin arch is fine, but this bug is still needed to be fixed for legacy plugins.

anusha-ragunathan · 2016-08-02T15:48:47Z

volume/store/store.go

@@ -309,16 +309,20 @@ func (s *VolumeStore) create(name, driverName string, opts, labels map[string]st
 // This makes sure there are no races between checking for the existence of a volume and adding a reference for it
 func (s *VolumeStore) GetWithRef(name, driverName, ref string) (volume.Volume, error) {
 	name = normaliseVolumeName(name)


What is the intent of the changes in store.go? Are you trying to fix all callsites of Locker.Lock to set and catch appropriate errors? If so, there's more (eg, libcontainer/client.go and even VolumeStore's Get(), CreateWithRef(), etc) and I'd recommend doing that in a separate PR.

These two locks prevent different races, which you have analyzed in previous comments. If a lock would hold for a long time, such as 16s, we should catch the error to prevent it from happening again and again. This is why I catch the error at these 2 places. These 2 locks may take 16s until they unlock.

~~If the lock action is very quick, it is totally fine to ignore the error.~~
If a lock uses CancelWithError, every Lock action of this lock should be wrapped in an if err block. If the lock doesn't use the CancelWithError, it is fine to ignore the error.

And I think you're right, maybe there are many "slow" locks in other places. It is a good idea to fix them in another PR.

GetDriver() calls lookup() which can be a long standing operation. There are other calls that hold a Locker.Lock and call GetDriver, for ex, Remove().

Make sense. Maybe next PR? This time, for restoring containers at the daemon start, only needs these 2 places. If I remove the GetWithRef part, it won't work again... 😥

Sounds good.

I'm thinking it would be better if there's a separate version of Lock called LockWithError() that checks for error. Callers that use this version should always use CancelWithError() to set errors. This would be cleaner. What do you think?

Maybe not...
Lock() and LockWithError() cannot be mixed used for one named lock. Once the lock is created in locks[name], it should be "cancelable" or "un-cancelable".

So another bool variable, cancelable, needs to be added into lockCtr struct, then we should check that both in old Lock() and new LockWithError().
When writing the logic, if Lock() meets a cancelable locker, the problem comes, old Lock() cannot return an error to indicate that it is locking a wrong type locker...

Beside, adding a new method LockWithError() still cannot ensure that developer will catch the error correctly and not do Unlock after LockWithError() returns an error.

When Lock() encounters a cancelable locker, it can Panic. This will ensure that the developer is conscious of choosing a cancel-able Lock vs a normal Lock. What we dont want is to turn all locks into cancelable locks.

If developer doesnt catch LockWithError() error, then Unlock will result in runtime error. So its okay.

ohmystack · 2016-08-04T03:53:01Z

My whole fix, "cancel a lock", might be a bad idea.
If using CancelWithError(), there would be 2 problems:

(volume/store/store.go)Remove, its error has more details, the refs. It is unable to be reported to other lock holders. So that other Remove waiters cannot return the full error message as the first one.
A lock may be used by different methods at the same time. The cancel may take effect to a different method and lead to a bad result. e.g.

 s.Remove             | s.Get
--------------------------------
 Lock                 |
                      | Lock
 Cance:errVolumeInUse | wait...
 Unlock               | wait
                      | get the lock, but return an "errVolumeInUse"   <- Canceled by a different method's error
                      |
                      |

Need to find another way to fix this bug.
@anusha-ragunathan What do you think? Maybe it's time to close this PR, and thanks for your support

thaJeztah · 2016-08-05T08:49:56Z

Thanks @ohmystack, sorry that it didn't work

GordonTheTurtle added the status/0-triage label Jul 25, 2016

anusha-ragunathan added the area/plugins label Jul 25, 2016

ohmystack force-pushed the fix/plugin-lock branch from a84a139 to 88e0064 Compare July 28, 2016 14:11

ohmystack force-pushed the fix/plugin-lock branch from 62966ec to b793c67 Compare July 29, 2016 03:24

GordonTheTurtle added the dco/no Automatically set by a bot when one of the commits lacks proper signature label Aug 1, 2016

ohmystack force-pushed the fix/plugin-lock branch from 23acc98 to b793c67 Compare August 1, 2016 07:29

GordonTheTurtle removed the dco/no Automatically set by a bot when one of the commits lacks proper signature label Aug 1, 2016

thaJeztah added status/2-code-review and removed status/0-triage labels Aug 1, 2016

cpuguy83 reviewed Aug 1, 2016
View reviewed changes

anusha-ragunathan reviewed Aug 1, 2016
View reviewed changes

ohmystack force-pushed the fix/plugin-lock branch from b793c67 to b171d9e Compare August 2, 2016 03:13

anusha-ragunathan reviewed Aug 2, 2016
View reviewed changes

ohmystack closed this Aug 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix plugin lock when restoring containers #25005

Fix plugin lock when restoring containers #25005

ohmystack commented Jul 25, 2016 •

edited

Loading

ohmystack commented Jul 25, 2016

cpuguy83 commented Jul 25, 2016

anusha-ragunathan commented Jul 26, 2016 •

edited

Loading

ohmystack commented Jul 27, 2016 •

edited

Loading

ohmystack commented Jul 28, 2016

ohmystack commented Jul 29, 2016 •

edited

Loading

ohmystack commented Jul 29, 2016

ohmystack commented Aug 1, 2016

cpuguy83 Aug 1, 2016

ohmystack Aug 1, 2016 •

edited

Loading

cpuguy83 Aug 1, 2016

ohmystack Aug 1, 2016

cpuguy83 commented Aug 1, 2016

ohmystack commented Aug 1, 2016 •

edited

Loading

anusha-ragunathan Aug 1, 2016

ohmystack Aug 2, 2016

anusha-ragunathan Aug 2, 2016

anusha-ragunathan commented Aug 1, 2016

ohmystack commented Aug 2, 2016

ohmystack commented Aug 2, 2016

anusha-ragunathan commented Aug 2, 2016

ohmystack commented Aug 2, 2016

anusha-ragunathan Aug 2, 2016 •

edited

Loading

ohmystack Aug 2, 2016 •

edited

Loading

ohmystack Aug 2, 2016

anusha-ragunathan Aug 2, 2016

ohmystack Aug 2, 2016 •

edited

Loading

anusha-ragunathan Aug 2, 2016

anusha-ragunathan Aug 2, 2016

ohmystack Aug 3, 2016 •

edited

Loading

anusha-ragunathan Aug 3, 2016 •

edited

Loading

ohmystack commented Aug 4, 2016 •

edited

Loading

thaJeztah commented Aug 5, 2016

		@@ -90,6 +93,13 @@ func (l *Locker) Lock(name string) {
		// once locked then we can decrement the number of waiters for this lock
		nameLock.Lock()

Fix plugin lock when restoring containers #25005

Fix plugin lock when restoring containers #25005

Conversation

ohmystack commented Jul 25, 2016 • edited Loading

How to reproduce

Should be

The cause

Here's the fixes

ohmystack commented Jul 25, 2016

cpuguy83 commented Jul 25, 2016

anusha-ragunathan commented Jul 26, 2016 • edited Loading

ohmystack commented Jul 27, 2016 • edited Loading

ohmystack commented Jul 28, 2016

ohmystack commented Jul 29, 2016 • edited Loading

ohmystack commented Jul 29, 2016

ohmystack commented Aug 1, 2016

Choose a reason for hiding this comment

ohmystack Aug 1, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cpuguy83 commented Aug 1, 2016

ohmystack commented Aug 1, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anusha-ragunathan commented Aug 1, 2016

ohmystack commented Aug 2, 2016

ohmystack commented Aug 2, 2016

anusha-ragunathan commented Aug 2, 2016

ohmystack commented Aug 2, 2016

anusha-ragunathan Aug 2, 2016 • edited Loading

Choose a reason for hiding this comment

ohmystack Aug 2, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ohmystack Aug 2, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ohmystack Aug 3, 2016 • edited Loading

Choose a reason for hiding this comment

anusha-ragunathan Aug 3, 2016 • edited Loading

Choose a reason for hiding this comment

ohmystack commented Aug 4, 2016 • edited Loading

thaJeztah commented Aug 5, 2016

ohmystack commented Jul 25, 2016 •

edited

Loading

anusha-ragunathan commented Jul 26, 2016 •

edited

Loading

ohmystack commented Jul 27, 2016 •

edited

Loading

ohmystack commented Jul 29, 2016 •

edited

Loading

ohmystack Aug 1, 2016 •

edited

Loading

ohmystack commented Aug 1, 2016 •

edited

Loading

anusha-ragunathan Aug 2, 2016 •

edited

Loading

ohmystack Aug 2, 2016 •

edited

Loading

ohmystack Aug 2, 2016 •

edited

Loading

ohmystack Aug 3, 2016 •

edited

Loading

anusha-ragunathan Aug 3, 2016 •

edited

Loading

ohmystack commented Aug 4, 2016 •

edited

Loading