WebSocketBadStatusException
Handshake status 500 Internal Server Error: read: connection timed out (re-raised as ApiException
) #36
Open
Description
Migrated from internal repo.
Complete stack trace and logs (sensitive) https://github.com/AI-Safety-Institute/aisi-inspect-tools/issues/180
Original date: 06 Nov 2024
│ <redacted>.venv/lib/python3.12/site-packages/websocket/_handshake.py:150 in _get_resp_headers │
│ │
│ 147 │ │ │ ) # read the body of the HTTP error message response and include it in the │
│ 148 │ │ else: │
│ 149 │ │ │ response_body = None │
│ > 150 │ │ raise WebSocketBadStatusException( │
│ 151 │ │ │ f"Handshake status {status} {status_message} -+-+- {resp_headers} -+-+- {res │
│ 152 │ │ │ status, │
│ 153 │ │ │ status_message, │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
WebSocketBadStatusException: Handshake status 500 Internal Server Error -+-+- {'audit-id': '89c3bf06-0c1c-4a49-804a-8756578da43d', 'cache-control': 'no-cache, private', 'content-type': 'application/json', 'date': 'Tue, 05 Nov 2024 23:39:52 GMT', 'content-length': '200'} -+-+- b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"error dialing backend: read tcp 192.168.105.37:40968-\\u003e192.168.160.39:10250: read: connection timed out","code":500}\n'
...
│ <redacted>/.venv/lib/python3.12/site-packages/kubernetes/stream/ws_client.py:538 in │
│ websocket_call │
│ │
│ 535 │ │ else: │
│ 536 │ │ │ return WSResponse('%s' % ''.join(all)) │
│ 537 │ except (Exception, KeyboardInterrupt, SystemExit) as e: │
│ > 538 │ │ raise ApiException(status=0, reason=str(e)) │
│ 539 │
│ 540 │
│ 541 def portforward_call(configuration, _method, url, **kwargs): │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
ApiException: (0)
Reason: Handshake status 500 Internal Server Error -+-+- {'audit-id': '89c3bf06-0c1c-4a49-804a-8756578da43d', 'cache-control': 'no-cache, private', 'content-type': 'application/json', 'date': 'Tue, 05 Nov 2024 23:39:52 GMT', 'content-length': '200'} -+-+- b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"error dialing backend: read tcp 192.168.105.37:40968-\\u003e192.168.160.39:10250: read: connection timed out","code":500}\n'
...
K8sError: Error during: Execute command in pod. {"pod": "agent-env-krcyegzg-default-0", "task_name": "<redacted>", "cmd": "['bash', '-c', "<redacted>", "stdin": "None", "cwd": "None", "timeout": "300"}
2024-11-05 22:51:14,385 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:51:14,594 - SANDBOX - K8S: Completed: Execute command in pod. {"result": "ExecResult(success=True, returncode=0, ...
2024-11-05 22:51:14,594 - SANDBOX - K8S: Starting: Read file from pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:51:15,019 - SANDBOX - K8S: Completed: Read file from pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:52:19,320 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:52:19,532 - SANDBOX - K8S: Completed: Execute command in pod. {"result": "ExecResult(success=True, returncode=0, ...
2024-11-05 22:52:19,533 - SANDBOX - K8S: Starting: Read file from pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:52:19,661 - SANDBOX - K8S: Completed: Read file from pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:53:28,437 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:53:33,670 - SANDBOX - K8S: Completed: Execute command in pod. {"result": "ExecResult(success=True, returncode=0, ...
2024-11-05 22:53:33,670 - SANDBOX - K8S: Starting: Read file from pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:53:33,813 - SANDBOX - K8S: Completed: Read file from pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:55:34,132 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:55:34,402 - SANDBOX - K8S: Completed: Execute command in pod. {"result": "ExecResult(success=True, returncode=0, ...
2024-11-05 22:55:34,402 - SANDBOX - K8S: Starting: Read file from pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:55:34,549 - SANDBOX - K8S: Completed: Read file from pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:56:15,437 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 23:39:53,808 - ERROR - K8S: Error during: Execute command in pod. {"cause": "(0)\nReason: Handshake status 500 Internal Server Error -+-+- {'audit-id': '89c3bf06-0c1c-4a49-804a-8756578da43d', 'cache-control': 'no-cache, private', 'content-type': 'application/json', 'date': 'Tue, 05 Nov 2024 23:39:52 GMT', 'content-length': '200'} -+-+- b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"error dialing backend: read tcp 192.168.105.37:40968-\\\\u003e192.168.160.39:10250: read: connection timed out\",\"code\":500}\\n'\n", "pod": "agent-env-krcyegzg-default-0", ...
2024-11-05T22:50:10Z Warning agent-env-krcyegzg-default-0 FailedScheduling 0/46 nodes are available: 2 node(s) had untolerated taint {CriticalAddonsOnly: true}, 2 node(s) had untolerated taint {aisi.gov.uk/dev: true}, 2 node(s) had untolerated taint {aisi.gov.uk/devpods: true}, 2 node(s) had untolerated taint {node.kubernetes.io/unreachable: }, 38 Insufficient memory. preemption: 0/46 nodes are available: 38 No preemption victims found for incoming pod, 8 Preemption is not helpful for scheduling.
2024-11-05T22:50:20Z Normal agent-env-krcyegzg-default-0 Scheduled Successfully assigned agent/agent-env-krcyegzg-default-0 to ip-192-168-160-39.eu-west-2.compute.internal
2024-11-05T22:50:21Z Normal agent-env-krcyegzg-default-0 Created Created container resolve-coredns-ip
2024-11-05T22:50:21Z Normal agent-env-krcyegzg-default-0 Pulled Container image "toolbelt/dig:2024-09-23" already present on machine
2024-11-05T22:50:21Z Normal agent-env-krcyegzg-default-0 Started Started container resolve-coredns-ip
2024-11-05T22:50:22Z Normal agent-env-krcyegzg-default-0 Created Created container default
2024-11-05T22:50:22Z Normal agent-env-krcyegzg-default-0 Pulled Container image "redacted" already present on machine
2024-11-05T22:50:22Z Normal agent-env-krcyegzg-default-0 Started Started container default
2024-11-05T22:56:46Z Warning agent-env-krcyegzg-default-0 NodeNotReady Node is not ready
2024-11-05T23:01:51Z Normal agent-env-krcyegzg-default SuccessfulCreate create Pod agent-env-krcyegzg-default-0 in StatefulSet agent-env-krcyegzg-default successful
2024-11-05T23:01:51Z Normal agent-env-krcyegzg-default-0 TaintManagerEviction Marking for deletion Pod agent/agent-env-krcyegzg-default-0
2024-11-05T23:01:51Z Normal agent-env-krcyegzg-default-0 TaintManagerEviction Cancelling deletion of Pod agent/agent-env-krcyegzg-default-0
2024-11-05T23:02:22Z Warning agent-env-krcyegzg-default-0 FailedScheduling 0/46 nodes are available: 2 node(s) had untolerated taint {CriticalAddonsOnly: true}, 2 node(s) had untolerated taint {aisi.gov.uk/dev: true}, 2 node(s) had untolerated taint {aisi.gov.uk/devpods: true}, 36 Insufficient memory, 4 node(s) had untolerated taint {node.kubernetes.io/unreachable: }. preemption: 0/46 nodes are available: 10 Preemption is not helpful for scheduling, 36 No preemption victims found for incoming pod.
2024-11-05T23:02:27Z Normal agent-env-krcyegzg-default-0 Scheduled Successfully assigned agent/agent-env-krcyegzg-default-0 to ip-192-168-104-100.eu-west-2.compute.internal
2024-11-05T23:02:28Z Normal agent-env-krcyegzg-default-0 Created Created container resolve-coredns-ip
2024-11-05T23:02:28Z Normal agent-env-krcyegzg-default-0 Started Started container resolve-coredns-ip
2024-11-05T23:02:28Z Normal agent-env-krcyegzg-default-0 Pulled Container image "toolbelt/dig:2024-09-23" already present on machine
2024-11-05T23:02:29Z Normal agent-env-krcyegzg-default-0 Pulled Container image "redacted" already present on machine
2024-11-05T23:02:30Z Normal agent-env-krcyegzg-default-0 Started Started container default
2024-11-05T23:02:30Z Normal agent-env-krcyegzg-default-0 Created Created container default
2024-11-05T23:39:55Z Normal agent-env-krcyegzg-default-0 Killing Stopping container default
From a large eval set which had max_samples larger than the cluster could accommodate.