all processes in container receive SIGTERM when sending SIGTERM to rkt process #3512

blalor · 2017-01-05T23:34:52Z

Environment

rkt Version: 1.21.0
appc Version: 0.8.9
Go Version: go1.7.3
Go OS/Arch: linux/amd64
Features: -TPM +SDJOURNAL
--
Linux 4.9.0-1.el7.elrepo.x86_64 x86_64
--
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
--
systemd 219
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN

What did you do?

With the attached file at /tmp/assassin.py:

rkt run \
    --debug \
    --insecure-options=ondisk,image \
    --mount volume=srv,target=/srv/ \
    --volume srv,kind=host,source=/tmp/ docker://python:2.7 \
    -- \
    /srv/assassin.py

In another window ¡ASSUMING THERE ARE NO OTHER CONTAINERS RUNNING!:

pkill --echo --full stage1/rootfs/usr/bin/systemd-nspawn

Wait for the container to exit. Then:

grep -h 'got signal' /tmp/assassin_*.log

What did you expect to see?

2017-01-05 23:19:16,066 pid:5 - got signal SIGTERM

What did you see instead?

2017-01-05 23:15:12,592 pid:10 - got signal SIGTERM
2017-01-05 23:15:12,593 pid:11 - got signal SIGTERM
2017-01-05 23:15:12,593 pid:12 - got signal SIGTERM
2017-01-05 23:15:12,594 pid:13 - got signal SIGTERM
2017-01-05 23:15:12,594 pid:14 - got signal SIGTERM
2017-01-05 23:15:12,595 pid:15 - got signal SIGTERM
2017-01-05 23:15:12,595 pid:16 - got signal SIGTERM
2017-01-05 23:15:12,596 pid:17 - got signal SIGTERM
2017-01-05 23:15:12,596 pid:18 - got signal SIGTERM
2017-01-05 23:15:12,597 pid:19 - got signal SIGTERM
2017-01-05 23:15:12,597 pid:5 - got signal SIGTERM

What just happened here?

assassin.py spawns 10 child processes. The parent and all the children write a log message whenever they receive SIGTERM, but they do not exit. The parent does not automatically propagate signals to its children (inhibited via os.setpgrp()). Therefore, sending SIGTERM to the parent process (or the process that spawned the container) should only result in a single log message being generated by the parent. This is in fact exactly what happens when you run assassin.py in a terminal and send the parent process SIGTERM from another terminal. rkt (or more likely systemd), on the other hand, sends SIGTERM to every single process in the container.

This makes it very difficult for a process which spawns children to shut down properly when the container is being shut down, especially when trying to wrangle an application whose source you don't directly control. When combined with #2870, it is impossible to implement any kind of processing after the main application (process, whatever) has exited on command from the rkt runtime.

The text was updated successfully, but these errors were encountered:

squeed · 2017-01-09T12:36:15Z

This is probably happening because the default systemd KillMode is control-group. I wonder if setting it to mixed in the stage1 unitfile would be the correct approach.

lucab · 2017-01-09T16:15:07Z

Possibly. But they would receive SIGTERM anyway, as when the whole pod goes down (ie. systemd-nspawn killed) systemd-pid1 will do a SIGTERM+SIGKILL round anyway. I don't see any easy way out of this, except for keeping the pod running á la rkt-app.

blalor · 2017-05-04T21:57:33Z

This issue has come up again for me. There's gotta be some kind of solution or workaround. I'm a big boy, I can handle my own signals, thankyouverymuch systemd. I really don't want to use Docker for my current problem. Or worse, not run in a container at all!

blalor · 2017-05-04T22:14:31Z

There's a generated systemd .service file in the stage1 rootfs that is (or appears to be) specifically for the single application that's been spawned:

2z [root@elasticsearch-0ed6f9c740fd9491c:/proc/6933/cwd/stage1/rootfs] # cat ./usr/lib64/systemd/system/elasticsearch.service
[Unit]
OnFailure=halt.target
Description=Application=elasticsearch Image=s3.amazonaws.com/example/apps/elasticsearch
DefaultDependencies=false
Wants=reaper-elasticsearch.service
Requires=sysusers.service
After=sysusers.service
Requires=prepare-app@-opt-stage2-elasticsearch-rootfs.service
After=prepare-app@-opt-stage2-elasticsearch-rootfs.service

[Service]
Restart=no
SyslogIdentifier=elasticsearch
StandardInput=null
StandardOutput=journal+console
StandardError=journal+console
TimeoutStartSec=0
ExecStart="/usr/local/bin/launch-elasticsearch.sh"
RootDirectory=/opt/stage2/elasticsearch/rootfs
WorkingDirectory=/
EnvironmentFile=/rkt/env/elasticsearch
User=0
Group=0
NoNewPrivileges=false
MemoryLimit=26000000000
CPUQuota=650%

Why can't that single service have KillMode set to mixed or process without impacting the pod as a whole? I'm sorry, I've never used rkt with multiple apps in a pod, so perhaps I'm missing a detail (or something bigger). I need time to orchestrate the shutdown of the application, but obviously more serious action needs to be taken if the timeout's exceeded.

lucab · 2017-05-05T08:22:01Z

@blalor is this a pod with a single app inside? My understanding is that your root problem comes from the pod also being torn down with this single application.

blalor · 2017-05-05T10:31:53Z

Yes, it's a single-application pod. I need the application to shut down in a controlled fashion, whereby the process launched by systemd is able to initiate cleanup and then notify child processes to exit. I can't do that if the child processes are told to terminate by systemd (and I'm unable to inhibit the child processes' SIGTERM handling).

This is the second time I've run into this problem, which revolves around managing data for a stateful application. I'm currently attempting to get an Elasticsearch node to remove itself from the cluster by deallocating shards on shutdown. A wrapper script is responsible for starting and stopping the main ES process; when SIGTERM is received, it updates the cluster state to move shards away from the terminating node, waits for the shards to be relocated, and then sends SIGTERM to the main ES process.

lucab · 2017-05-05T12:55:08Z

I'm not sure if this works, but you may as well try: instead of stopping the pod, just enter the running stage1 (via nsenter on the systemd-pid1 process) and do a systemctl kill or a plain kill on the parent service. This should let the parent do whatever it needs to handle children and then exit. I understand this is quite dirty, but it is actually how we are planning to implement rkt signal for #1496.

blalor · 2017-05-05T21:02:37Z

I'm working with containers scheduled via Nomad; that's not a viable production solution. I can test it on Monday against a running pod if you're looking for verification of the final config of the unit, but that's not something I can entertain in a production scenario.

fabiokung · 2017-07-24T20:27:14Z

I started working on custom KillModes on #3732

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

all processes in container receive SIGTERM when sending SIGTERM to rkt process #3512

all processes in container receive SIGTERM when sending SIGTERM to rkt process #3512

blalor commented Jan 5, 2017

squeed commented Jan 9, 2017

lucab commented Jan 9, 2017 •

edited

Loading

blalor commented May 4, 2017

blalor commented May 4, 2017 •

edited

Loading

lucab commented May 5, 2017

blalor commented May 5, 2017

lucab commented May 5, 2017

blalor commented May 5, 2017

fabiokung commented Jul 24, 2017

all processes in container receive SIGTERM when sending SIGTERM to rkt process #3512

all processes in container receive SIGTERM when sending SIGTERM to rkt process #3512

Comments

blalor commented Jan 5, 2017

squeed commented Jan 9, 2017

lucab commented Jan 9, 2017 • edited Loading

blalor commented May 4, 2017

blalor commented May 4, 2017 • edited Loading

lucab commented May 5, 2017

blalor commented May 5, 2017

lucab commented May 5, 2017

blalor commented May 5, 2017

fabiokung commented Jul 24, 2017

lucab commented Jan 9, 2017 •

edited

Loading

blalor commented May 4, 2017 •

edited

Loading