Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet failed to detect running docker process when name is not docker #26259

Closed
cuonglm opened this issue May 25, 2016 · 30 comments
Closed

Kubelet failed to detect running docker process when name is not docker #26259

cuonglm opened this issue May 25, 2016 · 30 comments
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@cuonglm
Copy link

cuonglm commented May 25, 2016

I used this instructions to install Openshift Origin v3, along with its embed Kubernetes. All thing is fine until I get the error:

E0525 16:17:20.437528    1899 container_manager_linux.go:267] failed to detect process id for "docker" - failed to find pid of "docker": exit status 1

It's weird, I'm sure that docker was started:

$ systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2016-05-25 15:26:01 ICT; 56min ago
     Docs: http://docs.docker.com
 Main PID: 3291 (sh)
   CGroup: /system.slice/docker.service
           ├─3291 /bin/sh -c /usr/bin/docker-current daemon $OPTIONS            $DOCKER_STORAGE_OPTIONS            $DOCKER_NETWORK_OPTIONS            $ADD_...
           ├─3292 /usr/bin/docker-current daemon --selinux-enabled
           └─3293 /usr/bin/forward-journald -tag docker

$ docker search lamdt
INDEX       NAME                              DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
docker.io   docker.io/lamdt/salt-master-ssh   Based on Ubuntu 12.04. That container incl...   0

I go through kubelet and seems to find the culprit, in this line:

dockerProcessName = "docker"

dockerProcessName was hard-coded to docker, while in my system, the docker process named docker-current:

$ ps aux | grep [d]ocker
root      3291  0.0  0.0 115244  1432 ?        Ss   15:25   0:00 /bin/sh -c /usr/bin/docker-current daemon $OPTIONS            $DOCKER_STORAGE_OPTIONS            $DOCKER_NETWORK_OPTIONS            $ADD_REGISTRY            $BLOCK_REGISTRY            $INSECURE_REGISTRY            2>&1 | /usr/bin/forward-journald -tag docker
root      3292  0.4  0.3 619668 29684 ?        Sl   15:25   0:15 /usr/bin/docker-current daemon --selinux-enabled
root      3293  0.0  0.0 101728  1904 ?        Sl   15:25   0:00 /usr/bin/forward-journald -tag docker

/usr/bin/docker is actually a shell script, which will call docker-current:

$ command -v docker
/usr/bin/docker
$ file "$(command -v docker)"
/usr/bin/docker: POSIX shell script, ASCII text executable

My OS is:

cat /etc/os-release 
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

I'm not sure it should be considered an OS bug, or kubelet bug itself?

@dims
Copy link
Member

dims commented May 25, 2016

@Gnouc It's probably best to open a bug against OpenShift Origin and close this one:
https://github.com/openshift/origin/issues

@cuonglm
Copy link
Author

cuonglm commented May 25, 2016

@dims why Openshift Origin?

If I ran kubernetes alone, the same issue occur.

I think we should state this an OS bug, or make kubelet handle this situation

@dims
Copy link
Member

dims commented May 25, 2016

@smarterclayton - what's the best venue for this bug?

@smarterclayton
Copy link
Contributor

This is a Kubernetes bug - the line referenced by @Gnouc points out that the kubelet assumes the docker binary is called "docker" which is not necessarily the case.

@smarterclayton smarterclayton added the sig/node Categorizes an issue or PR as relevant to SIG Node. label May 25, 2016
@smarterclayton
Copy link
Contributor

I think at a minimum that should not be a constant, but be defaulted. The workaround for now is to hardlink the docker-current process you have to docker.

@smarterclayton smarterclayton changed the title Kubelet failed to detect running docker process Kubelet failed to detect running docker process when name is not docker May 25, 2016
@dims
Copy link
Member

dims commented May 25, 2016

@smarterclayton thanks, sounds good. I have heard many times that there are patches/hardening to docker and kubernetes and was not sure.

@cuonglm
Copy link
Author

cuonglm commented May 25, 2016

@smarterclayton If you don't mind, I can push the fix for this issue.

@acavas1
Copy link

acavas1 commented May 27, 2016

@smarterclayton I'm having the same issue. If you don't mind explaining how to hardlink docker-current process to docker. Thanks

@TomasTomecek
Copy link

ln docker-current docker

@acavas1
Copy link

acavas1 commented May 27, 2016

thanks

@acavas1
Copy link

acavas1 commented May 27, 2016

I couldn't do a hard link in a bin directory as there already is a file named docker, so i moved docker-current, renamed it to docker and edited the /usr/lib/systemd/system/docker.service file with the new location of the renamed docker-current file.
Now I don't get the error, but get ������ in browser when I try to access the console.

@acavas1
Copy link

acavas1 commented May 27, 2016

My didn't access it with https.

@smarterclayton
Copy link
Contributor

@derekwaynecarr system container Cgrouping is broken with the latest docker packages due to the process name changing.

@dims
Copy link
Member

dims commented May 27, 2016

@acavas1 : can you try setting the pidfile parameter for your docker daemon? that may be an easier work around?

--pidfile=/var/run/docker.pid

Thanks,
Dims

@cuonglm
Copy link
Author

cuonglm commented May 27, 2016

@acavas1 Change the docker script to use #!/bin/bash, then replace the exec line with exec -a docker.

@derekwaynecarr
Copy link
Member

This has been fixed in Kubernetes here:
#25907

@cuonglm cuonglm closed this as completed May 27, 2016
@derekwaynecarr
Copy link
Member

The origin cherry pick is here: openshift/origin#9060

@priyanka5
Copy link

@TomasTomecek @smarterclayton : i am also facing the same issue in origin 1.2.0 with quick installation method. Can anybody tell me is this resolved and whats docker-current. how to use the workaround.

@smarterclayton
Copy link
Contributor

Hi @priyanka5, let's move this discussion to the origin repository (issue openshift/origin#9060)

@ghost
Copy link

ghost commented Jun 28, 2016

@dims I tried your solution on origin 1.2.0 (--pidfile in /etc/sysconfig/docker and restart) but this doesn't seem to help for me

@ghost
Copy link

ghost commented Jun 29, 2016

@Gnouc How does your /usr/bin/docker looks like? Tried to configure your changes but still the log after restarting
Still same issue on

oc v1.2.0
kubernetes v1.2.0-36-g4a3f9c5

@cuonglm
Copy link
Author

cuonglm commented Jun 29, 2016

@lvthillo Here is it:

#!/bin/sh
. /etc/sysconfig/docker
[ -e "${DOCKERBINARY}" ] || DOCKERBINARY=/usr/bin/docker-current
if [ ! -f /usr/bin/docker-current ]; then
    DOCKERBINARY=/usr/bin/docker-latest
fi
if [[ ${DOCKERBINARY} != "/usr/bin/docker-current" && ${DOCKERBINARY} != /usr/bin/docker-latest ]]; then
    echo "DOCKERBINARY has been set to an invalid value:" $DOCKERBINARY
    echo ""
    echo "Please set DOCKERBINARY to /usr/bin/docker-current or /usr/bin/docker-latest
by editing /etc/sysconfig/docker"
else
    exec ${DOCKERBINARY} "$@"

Change the shebang to #!/bin/bash, then exec ${DOCKERBINARY} "$@" to exec -a docker ${DOCKERBINARY} "$@"

@ghost
Copy link

ghost commented Jun 29, 2016

@Gnouc Thanks for the fast reply.
Now my /usr/bin/docker looks like this:

#!/bin/bash
. /etc/sysconfig/docker
[ -e "${DOCKERBINARY}" ] || DOCKERBINARY=/usr/bin/docker-current
if [ ! -f /usr/bin/docker-current ]; then
    DOCKERBINARY=/usr/bin/docker-latest
fi
if [[ ${DOCKERBINARY} != "/usr/bin/docker-current" && ${DOCKERBINARY} != /usr/bin/docker-latest ]]; then
    echo "DOCKERBINARY has been set to an invalid value:" $DOCKERBINARY
    echo ""
    echo "Please set DOCKERBINARY to /usr/bin/docker-current or /usr/bin/docker-latest
by editing /etc/sysconfig/docker"
else
    exec -a docker ${DOCKERBINARY} "$@"
fi

I restarted everything:

sudo service docker restart
sudo service origin-node restart

in logs:
E0629 10:42:53.022798 9519 container_manager_linux.go:267] failed to detect process id for "docker" - failed to find pid of "docker": exit status 1

@ghost
Copy link

ghost commented Jun 29, 2016

What I don't really understand is the fact that:

pidof docker returns nothing
pidof docker-current returns pid
cat /var/run/docker.pid returns same pid as docker-current.

@h-keisuke
Copy link

"/usr/bin/docker" is not used from the systemd.
See /usr/lib/systemd/system/docker.service

@ghost
Copy link

ghost commented Jun 29, 2016

@khanamura
What should I edit?

[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.com
After=network.target
Wants=docker-storage-setup.service

[Service]
Type=notify
NotifyAccess=all
EnvironmentFile=-/etc/sysconfig/docker
EnvironmentFile=-/etc/sysconfig/docker-storage
EnvironmentFile=-/etc/sysconfig/docker-network
Environment=GOTRACEBACK=crash
ExecStart=/bin/sh -c '/usr/bin/docker-current daemon \
          --exec-opt native.cgroupdriver=systemd \
          $OPTIONS \
          $DOCKER_STORAGE_OPTIONS \
          $DOCKER_NETWORK_OPTIONS \
          $ADD_REGISTRY \
          $BLOCK_REGISTRY \
          $INSECURE_REGISTRY \
          2>&1 | /usr/bin/forward-journald -tag docker'
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0
MountFlags=slave
Restart=on-abnormal
StandardOutput=null
StandardError=null

[Install]
WantedBy=multi-user.target

@cuonglm
Copy link
Author

cuonglm commented Jun 29, 2016

@lvthillo Change sh -c '/usr/bin/docker-current ... to sh -c 'exec -a /usr/bin/docker-current ...

@h-keisuke
Copy link

h-keisuke commented Jun 29, 2016

@lvthillo
Change
sh -c '/usr/bin/docker-current ...
to
sh -c 'exec -a docker /usr/bin/docker-current ...

and then

systemctl daemon-reload
systemctl restart docker.service
systemctl restart origin-node.service

@ghost
Copy link

ghost commented Jun 29, 2016

@khanamura @Gnouc Thanks, it works

@mootezbessifi
Copy link

@khanamura
Hi sir,
i tried to execute the some steps above. exactly the same by some time it cgoup changes to docker and some time still docker-current.
i am fed up with this issue that takes from me too much time.
please help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

9 participants