-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: can't start new thread #882
Comments
This seems indeed strange, let me ask you some questions to better understand the issue:
Thanks! |
Hi @ralgozino , thanks for your reply. I will answer your questions, but actually I do not have the need anymore to deploy GPM on that cluster. I've figured out that trivy-operator was additionally deployed on that cluster by another team. We do not need both and maybe this is also the reason for the problems.
|
I'm glad you sorted it out, I'll probably do some tests with thanks! |
Unfortunately the same error occurs again on another cluster (without trivy-operator deployed). Kubernetes Version now is older: 1.21.14 |
sorry to hear that @Harald-koeln, can you please check in the pod events if there are any more details that could be useful to debug? |
Hi @ralgozino, here are the pod events: │ Normal Scheduled 7m21s default-scheduler Successfully assigned gatekeeper-system/gatekeeper-policy-manager-75ff6b ││ 956-sdsxw to vache-3 ││ Warning Unhealthy 7m10s kubelet Liveness probe failed: Get "http://10.42.3.168:8080/health": context dea ││ dline exceeded (Client.Timeout exceeded while awaiting headers) ││ Warning Unhealthy 7m1s kubelet Liveness probe failed: Get "http://10.42.3.168:8080/health": read tcp 10 ││ .42.3.1:59484->10.42.3.168:8080: read: connection reset by peer ││ Warning Unhealthy 7m1s kubelet Readiness probe failed: Get "http://10.42.3.168:8080/health": read tcp 1 ││ 0.42.3.1:59486->10.42.3.168:8080: read: connection reset by peer ││ Warning Unhealthy 6m51s kubelet Liveness probe failed: Get "http://10.42.3.168:8080/health": read tcp 10 ││ .42.3.1:59514->10.42.3.168:8080: read: connection reset by peer ││ Warning Unhealthy 6m51s kubelet Readiness probe failed: Get "http://10.42.3.168:8080/health": read tcp 1 ││ 0.42.3.1:59512->10.42.3.168:8080: read: connection reset by peer ││ Warning Unhealthy 6m46s (x3 over 7m12s) kubelet Readiness probe failed: Get "http://10.42.3.168:8080/health": context de ││ adline exceeded (Client.Timeout exceeded while awaiting headers) ││ Warning Unhealthy 6m41s kubelet Readiness probe failed: Get "http://10.42.3.168:8080/health": read tcp 1 ││ 0.42.3.1:59542->10.42.3.168:8080: read: connection reset by peer ││ Warning Unhealthy 6m41s kubelet Liveness probe failed: Get "http://10.42.3.168:8080/health": read tcp 10 ││ .42.3.1:59544->10.42.3.168:8080: read: connection reset by peer ││ Warning Unhealthy 6m31s kubelet Readiness probe failed: Get "http://10.42.3.168:8080/health": read tcp 1 ││ 0.42.3.1:59566->10.42.3.168:8080: read: connection reset by peer ││ Normal Killing 6m21s (x2 over 6m51s) kubelet Container gatekeeper-policy-manager failed liveness probe, will be resta ││ rted ││ Warning Unhealthy 6m21s (x3 over 6m31s) kubelet (combined from similar events): Liveness probe failed: Get "http://10.42 ││ .3.168:8080/health": read tcp 10.42.3.1:59590->10.42.3.168:8080: read: connection reset by peer ││ Normal Started 6m17s (x3 over 7m14s) kubelet Started container gatekeeper-policy-manager ││ Normal Created 6m17s (x3 over 7m14s) kubelet Created container gatekeeper-policy-manager ││ Normal Pulled 2m7s (x7 over 7m17s) kubelet Container image "quay.io/sighup/gatekeeper-policy-manager:v1.0.8" alread ││ y present on machine |
hey @Harald-koeln I tried reproducing the error with some load testing but I can't trigger it. Do you have some limit sets on the number of processes that a container can run? or are the used inodes in the node close to the limit maybe? Anything particular of your setup that we should know to replicate the issue? I wonder if the same/similar issue happens to you with the new Go backend that is in development, would you mind testing it? You just need to change the image tag to |
Hi @ralgozino , thank you very much. The go-Version is working on all 5 clusters where I observed problems with the python version. |
glad to hear that! any feedback on the go backend version is very welcomed :-) |
Hello, in just one cluster (out of ~20) GPM is not starting (Crashloopbackoff) with this log output.
We are using version 0.7.0 and deploy the helm chart with ARGOCD. Kubernetes Version is 1.24.13
Please let me know, if other infos are needed.
Any help appreciated. Thank you!
...
[2023-10-06 07:16:41 +0000] [8] [INFO] In cluster configuration loaded successfully.
[2023-10-06 07:16:41 +0000] [8] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/gunicorn/arbiter.py", line 609, in spawn_worker
worker.init_process()
File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/gthread.py", line 95, in init_process
super().init_process()
File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/base.py", line 142, in init_process
self.run()
File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/gthread.py", line 214, in run
callback(key.fileobj)
File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/gthread.py", line 150, in on_client_socket_readable
self.enqueue_req(conn)
File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/gthread.py", line 117, in enqueue_req
fs = self.tpool.submit(self.handle, conn)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 176, in submit
self._adjust_thread_count()
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 199, in _adjust_thread_count
t.start()
File "/usr/local/lib/python3.11/threading.py", line 957, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
The text was updated successfully, but these errors were encountered: