-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ansible operator-sdk v1.5.0 with updated kube-rbac-proxy:v0.8.0 fails to run with permission denied #4684
Comments
FWIW, the default Go operator project for v1.5.0 uses kube-rbac-proxy v0.8.0, and all operator types pass CI. Does this happen with a newly-initialized operator, or to some operator that is trying to upgrade? # config/manager/manager.yaml
spec:
selector:
matchLabels:
control-plane: controller-manager
replicas: 1
template:
metadata:
labels:
control-plane: controller-manager
spec:
+ securityContext:
+ runAsNonRoot: true
containers /triage support |
Hi @estroz This is an ansible operator which I'm upgrading from operator-sdk I have tested what you suggested of adding that specific security context and didnt work:
Then deploy the changes: $ make deploy
cd config/manager && /home/slopez/bin/kustomize edit set image controller=quay.io/3scale/prometheus-exporter-operator:v0.3.0-alpha.11
/home/slopez/bin/kustomize build config/default | kubectl apply -f -
namespace/prometheus-exporter-operator-system unchanged
customresourcedefinition.apiextensions.k8s.io/prometheusexporters.monitoring.3scale.net unchanged
serviceaccount/prometheus-exporter-operator-controller-manager unchanged
role.rbac.authorization.k8s.io/prometheus-exporter-operator-leader-election-role unchanged
role.rbac.authorization.k8s.io/prometheus-exporter-operator-manager-role unchanged
clusterrole.rbac.authorization.k8s.io/prometheus-exporter-operator-metrics-reader unchanged
clusterrole.rbac.authorization.k8s.io/prometheus-exporter-operator-proxy-role unchanged
rolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-leader-election-rolebinding unchanged
rolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-manager-rolebinding unchanged
clusterrolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-proxy-rolebinding unchanged
service/prometheus-exporter-operator-controller-manager-metrics-service unchanged
deployment.apps/prometheus-exporter-operator-controller-manager configured
servicemonitor.monitoring.coreos.com/prometheus-exporter-operator-controller-manager-metrics-monitor unchanged Regarding what I said about |
@estroz Quick update with more details, we have tested the
|
There seems to be a problem with 0.8.0 in openshift 4.6 and 4.7 where the rabc-proxy fails to start. Reverting to 0.5.0 while this issue is analyzed. Check operator-framework/operator-sdk#4684 for more details. This reverts commit 6c4c763.
It probably will be solved with #4655 (master). |
There seems to be a problem with 0.8.0 in openshift 4.6 and 4.7 where the rabc-proxy fails to start. Reverting to 0.5.0 while this issue is analyzed. Check operator-framework/operator-sdk#4684 for more details. This reverts commit 6c4c763.
@camilamacedo86 Outputs from my previous comment #4684 (comment) refer to a In addition, I have just tested that changes on the ansible operator (using its own serviceaccount), without success, same error: $ git diff
diff --git a/config/default/manager_auth_proxy_patch.yaml b/config/default/manager_auth_proxy_patch.yaml
index 58dade9..92e80ff 100644
--- a/config/default/manager_auth_proxy_patch.yaml
+++ b/config/default/manager_auth_proxy_patch.yaml
@@ -10,7 +10,7 @@ spec:
spec:
containers:
- name: kube-rbac-proxy
- image: gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0
+ image: gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
args:
- "--secure-listen-address=0.0.0.0:8443"
- "--upstream=http://127.0.0.1:8080/"
diff --git a/config/manager/manager.yaml b/config/manager/manager.yaml
index fb3d02a..36b82c3 100644
--- a/config/manager/manager.yaml
+++ b/config/manager/manager.yaml
@@ -23,6 +23,8 @@ spec:
control-plane: controller-manager
spec:
serviceAccountName: controller-manager
+ securityContext:
+ runAsNonRoot: true
containers:
- name: manager
args:
@@ -36,6 +38,8 @@ spec:
fieldRef:
fieldPath: metadata.annotations['olm.targetNamespaces']
image: controller:latest
+ securityContext:
+ allowPrivilegeEscalation: false
livenessProbe:
httpGet:
path: /healthz
$ make deploy
cd config/manager && /home/slopez/bin/kustomize edit set image controller=quay.io/3scale/prometheus-exporter-operator:v0.3.0-alpha.11
/home/slopez/bin/kustomize build config/default | kubectl apply -f -
namespace/prometheus-exporter-operator-system unchanged
customresourcedefinition.apiextensions.k8s.io/prometheusexporters.monitoring.3scale.net unchanged
serviceaccount/prometheus-exporter-operator-controller-manager unchanged
role.rbac.authorization.k8s.io/prometheus-exporter-operator-leader-election-role unchanged
role.rbac.authorization.k8s.io/prometheus-exporter-operator-manager-role unchanged
clusterrole.rbac.authorization.k8s.io/prometheus-exporter-operator-metrics-reader unchanged
clusterrole.rbac.authorization.k8s.io/prometheus-exporter-operator-proxy-role unchanged
rolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-leader-election-rolebinding unchanged
rolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-manager-rolebinding unchanged
clusterrolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-proxy-rolebinding unchanged
service/prometheus-exporter-operator-controller-manager-metrics-service unchanged
deployment.apps/prometheus-exporter-operator-controller-manager configured
servicemonitor.monitoring.coreos.com/prometheus-exporter-operator-controller-manager-metrics-monitor unchanged
$ oc get pods -n prometheus-exporter-operator-system
NAME READY STATUS RESTARTS AGE
prometheus-exporter-operator-controller-manager-5d8d8f69bflzl5q 2/2 Running 0 5h6m # the one with v0.5.0
prometheus-exporter-operator-controller-manager-68588876878thxk 1/2 CreateContainerError 0 58s # new one
$ oc describe pod prometheus-exporter-operator-controller-manager-68588876878thxk -n prometheus-exporter-operator-system
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m38s default-scheduler Successfully assigned prometheus-exporter-operator-system/prometheus-exporter-operator-controller-manager-68588876878thxk to ip-10-96-11-248.ec2.internal
Normal Started 3m36s kubelet Started container manager
Normal AddedInterface 3m36s multus Add eth0 [10.128.2.31/23]
Warning Failed 3m36s kubelet Error: container create failed: time="2021-03-22T15:12:47Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
Normal Pulled 3m36s kubelet Container image "quay.io/3scale/prometheus-exporter-operator:v0.3.0-alpha.11" already present on machine
Normal Created 3m36s kubelet Created container manager
Warning Failed 3m35s kubelet Error: container create failed: time="2021-03-22T15:12:48Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
Warning Failed 3m34s kubelet Error: container create failed: time="2021-03-22T15:12:49Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
Warning Failed 3m23s kubelet Error: container create failed: time="2021-03-22T15:13:00Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
Warning Failed 3m9s kubelet Error: container create failed: time="2021-03-22T15:13:14Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
Warning Failed 2m57s kubelet Error: container create failed: time="2021-03-22T15:13:26Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
Warning Failed 2m45s kubelet Error: container create failed: time="2021-03-22T15:13:38Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
Warning Failed 2m31s kubelet Error: container create failed: time="2021-03-22T15:13:52Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
Warning Failed 2m19s kubelet Error: container create failed: time="2021-03-22T15:14:04Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
Normal Pulled 111s (x11 over 3m36s) kubelet Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" already present on machine
Warning Failed 111s (x2 over 2m6s) kubelet (combined from similar events): Error: container create failed: time="2021-03-22T15:14:32Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied" |
@slopezz try setting the user/group in the kube-rbac-proxy container # config/default/manager_auth_proxy_patch.yaml
ports:
- containerPort: 8443
name: https
+ securityContext:
+ runAsUser: 65532
+ runAsGroup: 65534
- name: manager
args:
- "--health-probe-bind-address=:8081" Got this from brancz/kube-rbac-proxy#101 |
I've tried this and see a failure with this message on the
I'm using CRC for testing, so:
|
@andrewazores you may actually need to use this OCP image instead of the "upstream" one. Additionally looks like that image uses user |
I have just tried with that alternate image but see the same "out of range" failure I posted above. From what other reading I have done this appears to be due to security constraints applied within my cluster, although I'm unsure if that's CRC-specific or from the general OpenShift version installed. |
@estroz I have tested the image that you suggested $ git diff
diff --git a/config/default/manager_auth_proxy_patch.yaml b/config/default/manager_auth_proxy_patch.yaml
index 58dade9..bc70b8b 100644
--- a/config/default/manager_auth_proxy_patch.yaml
+++ b/config/default/manager_auth_proxy_patch.yaml
@@ -10,7 +10,7 @@ spec:
spec:
containers:
- name: kube-rbac-proxy
- image: gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0
+ image: registry.redhat.io/openshift4/ose-kube-rbac-proxy:v4.7.0
args:
- "--secure-listen-address=0.0.0.0:8443"
- "--upstream=http://127.0.0.1:8080/"
diff --git a/config/manager/manager.yaml b/config/manager/manager.yaml
index fb3d02a..36b82c3 100644
--- a/config/manager/manager.yaml
+++ b/config/manager/manager.yaml
@@ -23,6 +23,8 @@ spec:
control-plane: controller-manager
spec:
serviceAccountName: controller-manager
+ securityContext:
+ runAsNonRoot: true
containers:
- name: manager
args:
@@ -36,6 +38,8 @@ spec:
fieldRef:
fieldPath: metadata.annotations['olm.targetNamespaces']
image: controller:latest
+ securityContext:
+ allowPrivilegeEscalation: false
livenessProbe:
httpGet:
path: /healthz
$ make deploy
cd config/manager && /home/slopez/bin/kustomize edit set image controller=quay.io/3scale/prometheus-exporter-operator:v0.3.0-alpha.11
/home/slopez/bin/kustomize build config/default | kubectl apply -f -
namespace/prometheus-exporter-operator-system created
customresourcedefinition.apiextensions.k8s.io/prometheusexporters.monitoring.3scale.net created
serviceaccount/prometheus-exporter-operator-controller-manager created
role.rbac.authorization.k8s.io/prometheus-exporter-operator-leader-election-role created
role.rbac.authorization.k8s.io/prometheus-exporter-operator-manager-role created
clusterrole.rbac.authorization.k8s.io/prometheus-exporter-operator-metrics-reader created
clusterrole.rbac.authorization.k8s.io/prometheus-exporter-operator-proxy-role created
rolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-leader-election-rolebinding created
rolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-proxy-rolebinding created
service/prometheus-exporter-operator-controller-manager-metrics-service created
deployment.apps/prometheus-exporter-operator-controller-manager created
servicemonitor.monitoring.coreos.com/prometheus-exporter-operator-controller-manager-metrics-monitor created
$ oc get pods -n prometheus-exporter-operator-system
NAME READY STATUS RESTARTS AGE
prometheus-exporter-operator-controller-manager-77f956bd4785sgl 2/2 Running 0 91s The problem is, that my operator is intended to run on both vanilla k8s and openshift, and this registry
|
This is a workaround with the permission error in issue: operator-framework/operator-sdk#4684 Signed-off-by: Wayne Sun <gsun@redhat.com>
I also hit this error using a helm-based operator. Using either of these two images worked. v0.8.0 fails with the nonroot error. Note that v4.7.0 generated an image pull error. v4.7 is what is specified in the tutorial at [1]. It works for me.
The tutorial [1] also says to use |
hey @sqqqrly, I assume you are using SDK The |
Thanks for the health probe note. I am hitting that and now I know why. |
Hi @slopezz, The downstream repo is now updated with this tag version, see: Note that it has mock projects which are tested against OCP. In this way, that ensures that is working on OCP as well. Then, the next downstream release will be using its latest version as well. So, IMHO it seems that can be closed. I will close this one, however, @slopezz if you face any issue with the next downstream release for 4.8 could you please raise a new issue and add the steps performed for we are able to reproduce it? Also, it might better fit via Bugzilla since seen a specific vendor issue and not part of the upstream scope at all. |
Bug Report
After upgrading an ansible operator to
operator-sdk v1.5.0
, operator controller-manager pod never gets running because of error on kube-rbac-proxy container (which onoperator-sdk v1.5.0
has been upgraded fromv0.5.0
tov0.8.0
):I've seen some issues related to that in go (#4402, kubernetes-sigs/kubebuilder#1978), for example on go operators v1.5.0, the scaffolding is:
USER 65532:65532
securityContext: allowPrivilegeEscalation: false
to both proxy and manager containersBut on go operators v1.5.0, where this error does not appear, it is still being used
kube-rbac-proxy:v0.5.0
.What did you do?
Create operator-sdk scaffolding using
operator-sdk v1.5.0
.What did you expect to see?
Controller-manager containers running OK.
What did you see instead? Under which circumstances?
Controller-manager kube-rbac-proxy container failing because of error:
Environment
Operator type:
/language ansible
Kubernetes cluster type: Openshift v4.7.0
$ operator-sdk version
$ kubectl version
Possible Solution
I tried to update container securityContext without success, error persisted.
Finally I have solved this error by downgrading kube-rbac-proxy from
v0.8.0
tov0.5.0
(actually, go operator-sdk v1.5.0 stays withv0.5.0
, it seems that only ansible-operator v1.5.0 has upgraded tov0.8.0
introducing the bug).Additional context
The text was updated successfully, but these errors were encountered: