-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot deploy GlusterFS through Kubernetes - "Couldn't find an alternative telinit implementation to spawn" #48937
Comments
@w17chm4n
Note: Method 1 will trigger an email to the group. You can find the group list here and label list here. |
@kubernetes/sig-node-bugs |
@w17chm4n: Reiterating the mentions to trigger a notification: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
One update - I can deploy this POD with no problems on |
@w17chm4n I am hitting the same bug on a freshly installed k8s cluster
For me, downgrading to a supported docker version helped to resolve this issue (v1.12):
|
@bnerd - Thanks for the info! |
@w17chm4n: Reiterating the mentions to trigger a notification: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is not really a bug since kubernetes doesn't support the docker version you use yet, but feel free to send a patch if you find the problem. |
@yujuhong I was thinking about your comment and I think it is still a bug since downgrading Kubernetes to 1.6 fixes the problem while having the same setup as described above. So something have changed in 1.7. But yeah. Will try to investigate on my own. |
I'm actually getting this same error with 1.7.1 and docker 1.13.1. Anyone have other thoughts on what the issue might be here? I found this so question (https://stackoverflow.com/questions/36545105/docker-couldnt-find-an-alternative-telinit-implementation-to-spawn) but it looks like everything is configured correctly.
|
In kubernetes 1.7, under pid # 1, there is a "/ pause" process, and since init takes pid 1, it is no longer available. In earlier versions of kubernetes, pid # 1 was free .
Kubernetes now shares a single PID namespace among all containers in a pod when running with docker >= 1.13.1. This means processes can now signal processes in other containers in a pod, but it also means that the |
I can work around the problem using the |
In 1.7 release, we enabled shared pid namespace if the docker version is docker 1.13 and beyond for debug pod feature. Do anyone know why those images, such as GlusterFS have to run as PID 1? @verb we need to figure out how to resolve this conflict with shared pid namespace. If we cannot, we have to disable that setting for 1.8 since we scheduled to support docker 1.13 in 1.8 timeframe. I am currently assigning you this bug to come up a solution. |
cc/ @abgworrall |
gluster/gluster-centos is a container image that runs systemd so that it can run ntpd, crond, gssproxy, glusterd & sshd inside a single container. This isn't how Kubernetes is intended to be used and I don't know how much we should go out of our way to support it (this is a question for @dchen1107 & @yujuhong) https://developers.redhat.com/blog/2016/09/13/running-systemd-in-a-non-privileged-container/ seems to be a good write up of the challenges of running systemd inside of docker. It also prevents Kubernetes from doing process management like:
Ideally we could change this many-processes-in-a-container to be many-containers-in-a-pod, something like this:
This doesn't need a privileged container and doesn't run sshd or ntpd, neither of which are needed in Kubernetes. This also doesn't run crond, but that may actually be needed by gluster. I've never used gluster. (I used If you must run systemd in a container, adding this to the container config will get systemd to run:
This tells systemd to run in system mode even though it's not PID 1 and disables its chroot detection (which is unrelated to PID 1 but checked when invoked as "systemd", I guess). |
@dchen1107 @verb afaict, running |
@humblec Those articles are about how to work around shortcomings in vanilla docker that Kubernetes solves natively: resource sharing via pods, containing via SELinux, reaping orphaned zombies, etc. My earlier question was along the lines of how much effort should go into supporting use cases that are not using Kubernetes as intended. (I'm actually asking, not trying to make a point.) Does Kubernetes commit to continuing support for every container image that's ever run on Kubernetes? For how long? Indefinitely? or can we change it over the course of multiple releases? Anyway, this change is compatible with running an init system in a container. The problem is with images that assume they will only ever have Brainstorming other options:
|
Yes running systemd in a container is a critical need for us. There are lots of situations where it is and will be used. Biggest use case if for people simply moving current workloads that run in a VM into a container.
|
If the intension is to force all containers inside of a POD to run with the same PID Namespace? I think this is a mistake since it eliminates another really cool use case. I would love to be able to run two containers inside of a pod, and INIT container with escallated privileges and a locked down container, with no privileges. I want to make sure the locked down container can not see the processes of the privileged container. An example of this, might be a container that loads a kernel module that is needed by the locked down container. Another use case would be a buildah container. The INIT container mounts up a COW file system onto a directory on a location that the locked down container can write, The locked down container writes out the data for his image, when the container completes, the INIT container commits the data to the repo. I believe allowing different containers inside of a POD to run with different security and isolated PIDs should not be removed. |
@rhatdan none of the things you've mentioned are removed by this change, save using docker to hide other processes in the pod, and I'm not sure why that's a requirement. Is anyone currently using the "privileged sidecar" pattern you describe or is it hypothetical? @dchen1107 @derekwaynecarr @yujuhong It just occurred to me that an interesting compromise here might be to only enable shared pid for multi-container pods. This would enable pod semantics for PID when it would make a difference while drop-in docker monoliths keep the PID 1 behavior they're expecting in docker. I'd have to double check, but I'm pretty sure that behavior could be implemented entirely in the dockershim with no API or CRI changes. |
I think that is the best solution is to make the PID Namespace sharing optional. |
@rhatdan - optional at what scope - cluster, node, pod. let's sync up before next sig-node and we can revisit if need be before 1.8 goes out. The main question right now is if pid namespace sharing is opt-in or opt-out. We need to weigh pro/cons of both. |
I would expect it at the POD, since pods would have different requirements, not sure why you would want to have this at the cluster or node level. |
@verb yes it works :) |
@w17chm4n awesome, thanks! I'd like to better understand the requirements of people objecting to this change (@rhatdan @jarrpa @humblec). I think there's a bit of confusion about what has changed, so let's start with some facts:
I can think of a few things that are objectionable about the change, but let's state them as requirements. For example, are these requirements for anyone?
Rather than having Kubernetes work around this container image, should we patch the Dockerfile to make it compatible with even the native docker shared-pid, unrelated to Kubernetes? |
We can ask the systemd guys why they recommend never using the --system flag, but I would guess one of the reasons is that processes showing up that were not created in the ancestry of systemd, could cause confusion. I would also guess that running more then one container with systemd --system would not be supported.
This is required so that you can run different containers with different security constraints in the same pod at the same time. Your last comment makes the |
This is the part I don't understand. Why? What does PID visibility have to do with security constraints? What am I missing?
nor do they run with a shared network namespace, but we impose that on docker containers because that's the pod model. |
An update from the last SIG Node: we'll disable this by default in 1.8 and revisit for 1.9 I'll use #41938 to track that. |
@verb, if I have a privileged container process in the same pid namespace as a pid in a non priv container, then the container process in the non priv can attack the priv process. Having them in separate pid namespaces gives you better security. Bottom line, you have a potential change where containers that expect to be run as PID 1, whether legitimately or not, could break when you make this change. Kubernetes in my opinion should make it easy to opt out of this behavior. |
/priority critical-urgent |
[MILESTONENOTIFIER] Milestone Labels Complete Issue label settings:
Additional instructions available here
|
Update: #51634 is pending review and should be added to the v1.8 Milestone to resolve this issue. |
Automatic merge from submit-queue (batch tested with PRs 51984, 51351, 51873, 51795, 51634) Revert to using isolated PID namespaces in Docker **What this PR does / why we need it**: Reverts to the previous docker default of using isolated PID namespaces for containers in a pod. There exist container images that expect always to be PID 1 which we want to support unmodified in 1.8. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #48937 **Special notes for your reviewer**: **Release note**: ```release-note Sharing a PID namespace between containers in a pod is disabled by default in 1.8. To enable for a node, use the --docker-disable-shared-pid=false kubelet flag. Note that PID namespace sharing requires docker >= 1.13.1. ```
In 1.7 release the shared namespace is enabled by default. This is causing a problem when running pbench container which runs systemd. Related issue in upstream kubernetes: kubernetes/kubernetes#48937 According to my understanding this will be fixed from 1.8 with the following patch: kubernetes/kubernetes#51634
In 1.7 release the shared PID namespaces is enabled by default. This is causing a problem when running pbench container which runs systemd. Related issue in upstream kubernetes: kubernetes/kubernetes#48937 According to my understanding this will be fixed from 1.8 with the following patch: kubernetes/kubernetes#51634
In 1.7 release the shared PID namespaces is enabled by default. This is causing a problem when running pbench container which runs systemd. Related issue in upstream kubernetes: kubernetes/kubernetes#48937 According to my understanding this will be fixed from 1.8 with the following patch: kubernetes/kubernetes#51634
with this the pause container can handle zombie processes see: https://www.ianlewis.org/en/almighty-pause-container sorry for glusterfs VM cotainer: kubernetes/kubernetes#48937 (comment)
BUG REPORT:
/kind bug
What happened:
I'm running a simple Kubernetes cluster with master and one node. All basic PODs are deploying successfully (ie. weave-net) but when I try to deploy GlusterFS as a POD - container creation fails with error:
This is a result of failing to perform
from
gluster/gluster-centos
docker file.The weird part is that if I run this image directly on node through docker it runs smoothly.
What you expected to happen:
I'd expect to be able to deploy GlusterFS through Kubernetess properly.
How to reproduce it (as minimally and precisely as possible):
Setup minimal cluster and try to deploy the follwing POD
Anything else we need to know?:
Environment:
kubectl version
):virtualbox
uname -a
):vagrant
The text was updated successfully, but these errors were encountered: