WebSocketBadStatusException
pod does not exist (re-raised as ApiException
) #40
Open
Description
Migrated from internal repo.
Complete stack trace and logs (sensitive) https://github.com/AI-Safety-Institute/aisi-inspect-tools/issues/149
Original date: 25 Oct 2024
During an eval, the an ApiException
was raised which caused a task to fail.
"traceback": "Traceback (most recent call last):
File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/stream/ws_client.py\", line 528, in websocket_call
client = WSClient(configuration, url, headers, capture_all, binary=binary)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/stream/ws_client.py\", line 68, in __init__
self.sock = create_websocket(configuration, url, headers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/stream/ws_client.py\", line 494, in create_websocket
websocket.connect(url, **connect_opt)
File \"redacted/.venv/lib/python3.12/site-packages/websocket/_core.py\", line 261, in connect
self.handshake_response = handshake(self.sock, url, *addrs, **options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/websocket/_handshake.py\", line 65, in handshake
status, resp = _get_resp_headers(sock)
^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/websocket/_handshake.py\", line 150, in _get_resp_headers
raise WebSocketBadStatusException(
websocket._exceptions.WebSocketBadStatusException: Handshake status 404 Not Found -+-+- {'content-length': '18', 'content-type': 'text/plain; charset=utf-8', 'date': 'Fri, 25 Oct 2024 12:35:35 GMT'} -+-+- b'pod does not exist'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/_eval/task/run.py\", line 266, in task_run
sample_results = await asyncio.gather(*sample_coroutines)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/_eval/task/run.py\", line 431, in task_run_sample
error = sample_error(ex)
^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/_eval/task/error.py\", line 22, in __call__
raise ex
File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/_eval/task/run.py\", line 423, in task_run_sample
state = await plan(state, generate)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/solver/_plan.py\", line 106, in __call__
state = await solver(state, generate)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/solver/_basic_agent.py\", line 184, in solve
tool_results = await call_tools(state.output.message, state.tools)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/model/_call_tools.py\", line 154, in call_tools
results = await asyncio.gather(*tasks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/model/_call_tools.py\", line 74, in call_tool_task
result = await call_tool(tdefs, message.text, call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/model/_call_tools.py\", line 196, in call_tool
result = await tool_def.tool(**arguments)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/...bash.py\", line 48, in bash
result, new_cwd = await run_bash_command(
^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/...bash.py\", line 98, in run_bash_command
result = await bash_sandbox.exec([\"bash\", \"-c\", code], timeout=timeout_seconds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/sandbox_environment.py\", line 102, in exec
return await self._pod.exec(cmd, input, cwd, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 91, in exec
result = await self._run_asynchronously(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 149, in _run_asynchronously
return await loop.run_in_executor(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/usr/lib/python3.12/concurrent/futures/thread.py\", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 92, in <lambda>
lambda: executor.exec(cmd, stdin, cwd, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 246, in exec
with self._interactive_shell(timeout) as ws_client:
File \"/usr/lib/python3.12/contextlib.py\", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 266, in _interactive_shell
ws_client: WSClient = stream(
^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/stream/stream.py\", line 36, in _websocket_request
out = api_method(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/client/api/core_v1_api.py\", line 994, in connect_get_namespaced_pod_exec
return self.connect_get_namespaced_pod_exec_with_http_info(name, namespace, **kwargs) # noqa: E501
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/client/api/core_v1_api.py\", line 1101, in connect_get_namespaced_pod_exec_with_http_info
return self.api_client.call_api(
^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/client/api_client.py\", line 348, in call_api
return self.__call_api(resource_path, method,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/client/api_client.py\", line 180, in __call_api
response_data = self.request(
^^^^^^^^^^^^^
File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/stream/ws_client.py\", line 538, in websocket_call
raise ApiException(status=0, reason=str(e))
kubernetes.client.exceptions.ApiException: (0)
Reason: Handshake status 404 Not Found -+-+- {'content-length': '18', 'content-type': 'text/plain; charset=utf-8', 'date': 'Fri, 25 Oct 2024 12:35:35 GMT'} -+-+- b'pod does not exist'
",
The key info is "pod does not exist". Looking at kubectl get events
agent-env-w3qxf7hy-default-0` for we can see
2024-10-25T12:20:11Z Normal agent-env-w3qxf7hy-default-0 Scheduled Successfully assigned agent/agent-env-w3qxf7hy-default-0 to ip-192-168-118-142.eu-west-2.compute.internal
2024-10-25T12:20:13Z Normal agent-env-w3qxf7hy-default-0 Created Created container resolve-coredns-ip
2024-10-25T12:20:13Z Normal agent-env-w3qxf7hy-default-0 Pulled Container image "toolbelt/dig:2024-09-23" already present on machine
2024-10-25T12:20:13Z Normal agent-env-w3qxf7hy-default-0 Started Started container resolve-coredns-ip
2024-10-25T12:20:14Z Normal agent-env-w3qxf7hy-default-0 Pulled Container image "redacted" already present on machine
2024-10-25T12:20:15Z Normal agent-env-w3qxf7hy-default-0 Created Created container default
2024-10-25T12:20:15Z Normal agent-env-w3qxf7hy-default-0 Started Started container default
2024-10-25T12:22:27Z Warning agent-env-w3qxf7hy-default-0 NodeNotReady Node is not ready
2024-10-25T12:27:32Z Normal agent-env-w3qxf7hy-default-0 TaintManagerEviction Marking for deletion Pod agent/agent-env-w3qxf7hy-default-0
2024-10-25T12:27:32Z Normal agent-env-w3qxf7hy-default-0 Scheduled Successfully assigned agent/agent-env-w3qxf7hy-default-0 to ip-192-168-103-228.eu-west-2.compute.internal
2024-10-25T12:27:32Z Normal agent-env-w3qxf7hy-default SuccessfulCreate create Pod agent-env-w3qxf7hy-default-0 in StatefulSet agent-env-w3qxf7hy-default successful
2024-10-25T12:27:33Z Normal agent-env-w3qxf7hy-default-0 Pulled Container image "toolbelt/dig:2024-09-23" already present on machine
2024-10-25T12:27:33Z Normal agent-env-w3qxf7hy-default-0 Created Created container resolve-coredns-ip
2024-10-25T12:27:34Z Normal agent-env-w3qxf7hy-default-0 Started Started container resolve-coredns-ip
2024-10-25T12:27:34Z Normal agent-env-w3qxf7hy-default-0 Pulled Container image "redacted" already present on machine
2024-10-25T12:27:34Z Normal agent-env-w3qxf7hy-default-0 Created Created container default
2024-10-25T12:27:35Z Normal agent-env-w3qxf7hy-default-0 Started Started container default
2024-10-25T12:35:35Z Normal agent-env-w3qxf7hy-default-0 Killing Stopping container default
2024-10-25T12:35:36Z Normal agent-env-w3qxf7hy-default-0 Killing Stopping container default
Note the node not ready, after which it looks like the pod was rescheduled.
I don't know the reason the node went not ready.
Lots of evals on old versions of challenges where resource requests were not specified were also running.