-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/v2/runc: handle early exits w/o big locks #8617
Conversation
Hi @corhere. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@corhere: GitHub didn't allow me to request PR reviews from the following users: laurazard. Note that only containerd members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1476721
to
16698bf
Compare
I was able to get rid of the timed garbage collector by leveraging the fact that the only early exits which matter for a given s.Start call are the exits which are received in between container.Start() and the PID being added to s.running. I will squash down the branch if tests pass and reviewers are happy with the new approach. |
Looks like one commit misses a DCO |
53dca61
to
adbb1b1
Compare
@laurazard: GitHub didn't allow me to request PR reviews from the following users: PTAL. Note that only containerd members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Looks like this caused TestRegressionIssue4769 to go a little haywire, it's failing for runc, crun and the rockylinux runs. @laurazard Can you look into this? |
Aaahh, I see and can replicate it locally, I'll take a look, thanks @dcantah. |
adbb1b1
to
fc17ca0
Compare
runtime/v2/runc/task/service.go
Outdated
@@ -113,7 +113,7 @@ type service struct { | |||
containers map[string]*runc.Container | |||
|
|||
lifecycleMu sync.Mutex | |||
running map[int][]containerProcess // pid -> running process, guarded by lifecycleMu | |||
running map[int]map[containerProcess]struct{} // pid -> running process, guarded by lifecycleMu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting here to get some input: the fix for the failure mentioned in #8617 (comment) has to do with us creating duplicate entries for the same containerProcess
in s.running
(first in s.Create
, then in s.Start
), which then causes us to send exitEvents for the same containerProcess
twice.
I fixed this in c43966c by storing replacing the containerProcess
slice with a map, and using the containerProcess
as the key so that we don't keep the same containerProcess
there twice. This works fine, but I wonder if we can make this more simple: I don't think we can assume pid
s are unique, although if we could we could do:
running map[int]map[containerProcess]struct{} // pid -> running process, guarded by lifecycleMu | |
running map[int]containerProcess // pid -> running process, guarded by lifecycleMu |
If we can't assume that, I still wonder if we should do something like
running map[int]map[containerProcess]struct{} // pid -> running process, guarded by lifecycleMu | |
running map[int]map[string]containerProcess // pid -> running process, guarded by lifecycleMu |
where the key for the inner map is something like container.ID
, since I think containerID+processID should be unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@corhere thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silly question; if pid
here is containerProcess.Process.Pid
, could it then be map[containerProcess]struct{}
? (or map[containerID]map[containerProcess]struct{}
if we need to index per container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not :'(. We have multiple processes per-container, and when we get the exit event back from runC we only get the specific process PID, so the "outer" map must be keyed by pid
– so that when we get an exit event, we can check our s.running
map with the exit-event's PID and get the containerProcesses
we know are running for that PID so that we can process the exit event for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ugh... gotcha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laurazard and I have concluded that the duplicate entries in s.running
is really a symptom of an underlying bug in my implementation which could manifest as an exit event being published before a start event in certain cases. Deduplicating the running
map values by containerProcess
does not address the root cause. The correct solution, we believe, is to have (*service).Start()
remove the container init process from s.running
before calling container.Start()
and add it back afterwards so that the process exiting before (*service).Start()
publishes the TaskStart
event is correctly handled as an early exit.
c43966c
to
5c52599
Compare
eventSendMu is causing severe lock contention when multiple processes start and exit concurrently. Replace it with a different scheme for maintaining causality w.r.t. start and exit events for a process which does not rely on big locks for synchronization. Keep track of all processes for which a Task(Exec)Start event has been published and have not yet exited in a map, keyed by their PID. Processing exits then is as simple as looking up which process corresponds to the PID. If there are no started processes known with that PID, the PID must either belong to a process which was started by s.Start() and before the s.Start() call has added the process to the map of running processes, or a reparented process which we don't care about. Handle the former case by having each s.Start() call subscribe to exit events before starting the process. It checks if the PID has exited in the time between it starting the process and publishing the TaskStart event, handling the exit if it has. Exit events for reparented processes received when no s.Start() calls are in flight are immediately discarded, and events received during an s.Start() call are discarded when the s.Start() call returns. Co-authored-by: Laura Brehm <laurabrehm@hey.com> Signed-off-by: Cory Snider <csnider@mirantis.com>
5c52599
to
5cd6210
Compare
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Although under normal circumstances, init process wouldn't quit so early. But if this screnario occurs, handleStarted() will call handleProcessExit(). It will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. I found that after pr containerd#8617, handleProcessExit() won't access s.containers, so we can remove the unnecessary lock guard to avoid deadlock. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Although under normal circumstances, init process wouldn't quit so early. But if this screnario occurs, handleStarted() will call handleProcessExit(). It will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. I found that after pr containerd#8617, handleProcessExit() won't access s.containers, so we can remove the unnecessary lock guard to avoid deadlock. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Although under normal circumstances, init process wouldn't quit so early. But if this screnario occurs, handleStarted() will call handleProcessExit(). It will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. I found that after pr containerd#8617, handleProcessExit() won't access s.containers, so we can remove the unnecessary lock guard to avoid deadlock. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Although under normal circumstances, init process wouldn't quit so early. But if this screnario occurs, handleStarted() will call handleProcessExit(). It will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. I found that after pr containerd#8617, handleProcessExit() won't access s.containers, so we can remove the unnecessary lock guard to avoid deadlock. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. I found that after pr containerd#8617, handleProcessExit() won't access s.containers, so we can remove the unnecessary lock guard to avoid deadlock. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. I found that after pr containerd#8617, handleProcessExit() won't access s.containers, so we can remove the unnecessary lock guard to avoid deadlock. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. So I add handleProcessExitNoLock() function which will not lock s.mu. It can safely be called in create handler without deadlock. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. So I add handleProcessExitNoLock() function which will not lock s.mu. It can safely be called in create handler without deadlock. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. So I add handleProcessExitNoLock() function which will not lock s.mu. It can safely be called in create handler without deadlock. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. So, I added a parameter muLocked to handleStarted to indicate whether or not s.mu is currently locked, and thus deciding whether or not to lock it when calling handleProcessExit. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. For some historical reasons, the use of s.mu is a bit confusing, sometimes used to protect s.containers only, and sometimes used as a mutex for some functions. According to analysis of @corhere in containerd#9103, we can know that: Locking s.mu for Create() handler is introduced in containerd#1452, which was a solution for the missed early-exits problem: Create() holds the mutex to block checkProcesses() from handling exit events until after Create() has added the container process to the s.processes map. This locking logic was copied into the v2 shim implementation. containerd#8617 solves the same problem using a strategy that does not rely on mutual exclusion between container create and exit-event handling. As for Shutdown(), the implementation was added in containerd#3004. And in this initial implementation, the mutex is locked to safely access s.containers, it is not unlocked because nothing matters after os.Exit(0). Then ae87730 changed Shutdown() to return instead of exiting, followed by containerd#4988 to unlock upon return. If the intention is to block containers from being created while the shim's task service is shutting down, locking a mutex is a poor solution since it makes the create requests hang instead of returning an error, and is ineffective after Shutdown() returns as the mutex is unlocked. If Create() or handleProcessExit() enters again after Shutdown() unlocking s.mu and returning, shim service will panic by sending to a closed channel. So I remove the unnecessary lock guards in service, and rename mu to containersMu to make it clear that it is used to protect s.containers only. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. For some historical reasons, the use of s.mu is a bit confusing, sometimes used to protect s.containers only, and sometimes used as a mutex for some functions. According to analysis of @corhere in containerd#9103, we can know that: Locking s.mu for Create() handler is introduced in containerd#1452, which was a solution for the missed early-exits problem: Create() holds the mutex to block checkProcesses() from handling exit events until after Create() has added the container process to the s.processes map. This locking logic was copied into the v2 shim implementation. containerd#8617 solves the same problem using a strategy that does not rely on mutual exclusion between container create and exit-event handling. As for Shutdown(), the implementation was added in containerd#3004. And in this initial implementation, the mutex is locked to safely access s.containers, it is not unlocked because nothing matters after os.Exit(0). Then ae87730 changed Shutdown() to return instead of exiting, followed by containerd#4988 to unlock upon return. If the intention is to block containers from being created while the shim's task service is shutting down, locking a mutex is a poor solution since it makes the create requests hang instead of returning an error, and is ineffective after Shutdown() returns as the mutex is unlocked. If Create() or handleProcessExit() enters again after Shutdown() unlocking s.mu and returning, shim service will panic by sending to a closed channel. So I remove the unnecessary lock guards in service, and rename mu to containersMu to make it clear that it is used to protect s.containers only. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. So, I added a parameter muLocked to handleStarted to indicate whether or not s.mu is currently locked, and thus deciding whether or not to lock it when calling handleProcessExit. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. So, I added a parameter muLocked to handleStarted to indicate whether or not s.mu is currently locked, and thus deciding whether or not to lock it when calling handleProcessExit. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com> (cherry picked from commit 68dd47e) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. So, I added a parameter muLocked to handleStarted to indicate whether or not s.mu is currently locked, and thus deciding whether or not to lock it when calling handleProcessExit. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com> (cherry picked from commit 68dd47e) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. So, I added a parameter muLocked to handleStarted to indicate whether or not s.mu is currently locked, and thus deciding whether or not to lock it when calling handleProcessExit. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com> (cherry picked from commit 68dd47e) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. So, I added a parameter muLocked to handleStarted to indicate whether or not s.mu is currently locked, and thus deciding whether or not to lock it when calling handleProcessExit. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com> (cherry picked from commit 68dd47e) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. So, I added a parameter muLocked to handleStarted to indicate whether or not s.mu is currently locked, and thus deciding whether or not to lock it when calling handleProcessExit. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com> (cherry picked from commit 68dd47e) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. So, I added a parameter muLocked to handleStarted to indicate whether or not s.mu is currently locked, and thus deciding whether or not to lock it when calling handleProcessExit. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com>
…/main Merge upstream containerd/main at commit 5d1ab01 into ado fork-external/main Related work items: containerd#7944, containerd#8174, containerd#8334, containerd#8362, containerd#8572, containerd#8582, containerd#8588, containerd#8605, containerd#8606, containerd#8617, containerd#8619, containerd#8626, containerd#8627, containerd#8633, containerd#8637, containerd#8641, containerd#8643, containerd#8645, containerd#8652, containerd#8667, containerd#8672, containerd#8673, containerd#8676, containerd#8680, containerd#8684, containerd#8685, containerd#8692, containerd#8696, containerd#8697, containerd#8701, containerd#8708, containerd#8717, containerd#8726, containerd#8728, containerd#8729, containerd#8731, containerd#8732, containerd#8740, containerd#8752, containerd#8758, containerd#8762, containerd#8764
eventSendMu is causing severe lock contention when multiple processes start and exit concurrently. Replace it with a different scheme for maintaining causality w.r.t. start and exit events for a process which does not rely on big locks for synchronization.
Keep track of all processes for which a Task(Exec)Start event has been published and have not yet exited in a map, keyed by their PID. Processing exits then is as simple as looking up which process corresponds to the PID. If there are no started processes known with that PID, the PID must either belong to a process which was started by s.Start() and before the s.Start() call has added the process to the map of running processes, or a reparented process which we don't care about. Handle the former case by having each s.Start() call subscribe to "early exit" events before starting the process. It checks if the PID has exited in the time between it starting the process and publishing the TaskStart event, handling the exit if it has. Exit events for reparented processes received when no s.Start() calls are in flight are immediately discarded, and events received during an s.Start() call are discarded when the s.Start() call returns.
Big thanks to @laurazard for analyzing the root cause of #8557 and for also developing a fix. I would not have come up with the idea for this solution without inspiration from hers.