Skip to content

WebSocketBadStatusException Handshake status 500 Internal Server Error: read: connection timed out (re-raised as ApiException) #36

Open
@craigwalton-dsit

Description

Migrated from internal repo.
Complete stack trace and logs (sensitive) https://github.com/AI-Safety-Institute/aisi-inspect-tools/issues/180
Original date: 06 Nov 2024

│ <redacted>.venv/lib/python3.12/site-packages/websocket/_handshake.py:150 in _get_resp_headers    │
│                                                                                                                      │
│   147 │   │   │   )  # read the body of the HTTP error message response and include it in the                        │
│   148 │   │   else:                                                                                                  │
│   149 │   │   │   response_body = None                                                                               │
│ > 150 │   │   raise WebSocketBadStatusException(                                                                     │
│   151 │   │   │   f"Handshake status {status} {status_message} -+-+- {resp_headers} -+-+- {res                       │
│   152 │   │   │   status,                                                                                            │
│   153 │   │   │   status_message,                                                                                    │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
WebSocketBadStatusException: Handshake status 500 Internal Server Error -+-+- {'audit-id': '89c3bf06-0c1c-4a49-804a-8756578da43d', 'cache-control': 'no-cache, private', 'content-type': 'application/json', 'date': 'Tue, 05 Nov 2024 23:39:52 GMT', 'content-length': '200'} -+-+- b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"error dialing backend: read tcp 192.168.105.37:40968-\\u003e192.168.160.39:10250: read: connection timed out","code":500}\n'
...
│ <redacted>/.venv/lib/python3.12/site-packages/kubernetes/stream/ws_client.py:538 in               │
│ websocket_call                                                                                                       │
│                                                                                                                      │
│   535 │   │   else:                                                                                                  │
│   536 │   │   │   return WSResponse('%s' % ''.join(all))                                                             │
│   537 │   except (Exception, KeyboardInterrupt, SystemExit) as e:                                                    │
│ > 538 │   │   raise ApiException(status=0, reason=str(e))                                                            │
│   539                                                                                                                │
│   540                                                                                                                │
│   541 def portforward_call(configuration, _method, url, **kwargs):                                                   │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
ApiException: (0)
Reason: Handshake status 500 Internal Server Error -+-+- {'audit-id': '89c3bf06-0c1c-4a49-804a-8756578da43d', 'cache-control': 'no-cache, private', 'content-type': 'application/json', 'date': 'Tue, 05 Nov 2024 23:39:52 GMT', 'content-length': '200'} -+-+- b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"error dialing backend: read tcp 192.168.105.37:40968-\\u003e192.168.160.39:10250: read: connection timed out","code":500}\n'
...
K8sError: Error during: Execute command in pod. {"pod": "agent-env-krcyegzg-default-0", "task_name": "<redacted>", "cmd": "['bash', '-c', "<redacted>", "stdin": "None", "cwd": "None", "timeout": "300"}
2024-11-05 22:51:14,385 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:51:14,594 - SANDBOX - K8S: Completed: Execute command in pod. {"result": "ExecResult(success=True, returncode=0, ...
2024-11-05 22:51:14,594 - SANDBOX - K8S: Starting: Read file from pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:51:15,019 - SANDBOX - K8S: Completed: Read file from pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:52:19,320 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:52:19,532 - SANDBOX - K8S: Completed: Execute command in pod. {"result": "ExecResult(success=True, returncode=0, ...
2024-11-05 22:52:19,533 - SANDBOX - K8S: Starting: Read file from pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:52:19,661 - SANDBOX - K8S: Completed: Read file from pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:53:28,437 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:53:33,670 - SANDBOX - K8S: Completed: Execute command in pod. {"result": "ExecResult(success=True, returncode=0, ...
2024-11-05 22:53:33,670 - SANDBOX - K8S: Starting: Read file from pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:53:33,813 - SANDBOX - K8S: Completed: Read file from pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:55:34,132 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:55:34,402 - SANDBOX - K8S: Completed: Execute command in pod. {"result": "ExecResult(success=True, returncode=0, ...
2024-11-05 22:55:34,402 - SANDBOX - K8S: Starting: Read file from pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:55:34,549 - SANDBOX - K8S: Completed: Read file from pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 22:56:15,437 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-krcyegzg-default-0", ...
2024-11-05 23:39:53,808 - ERROR - K8S: Error during: Execute command in pod. {"cause": "(0)\nReason: Handshake status 500 Internal Server Error -+-+- {'audit-id': '89c3bf06-0c1c-4a49-804a-8756578da43d', 'cache-control': 'no-cache, private', 'content-type': 'application/json', 'date': 'Tue, 05 Nov 2024 23:39:52 GMT', 'content-length': '200'} -+-+- b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"error dialing backend: read tcp 192.168.105.37:40968-\\\\u003e192.168.160.39:10250: read: connection timed out\",\"code\":500}\\n'\n", "pod": "agent-env-krcyegzg-default-0", ...
2024-11-05T22:50:10Z   Warning   agent-env-krcyegzg-default-0                  FailedScheduling        0/46 nodes are available: 2 node(s) had untolerated taint {CriticalAddonsOnly: true}, 2 node(s) had untolerated taint {aisi.gov.uk/dev: true}, 2 node(s) had untolerated taint {aisi.gov.uk/devpods: true}, 2 node(s) had untolerated taint {node.kubernetes.io/unreachable: }, 38 Insufficient memory. preemption: 0/46 nodes are available: 38 No preemption victims found for incoming pod, 8 Preemption is not helpful for scheduling.
2024-11-05T22:50:20Z   Normal    agent-env-krcyegzg-default-0                  Scheduled               Successfully assigned agent/agent-env-krcyegzg-default-0 to ip-192-168-160-39.eu-west-2.compute.internal
2024-11-05T22:50:21Z   Normal    agent-env-krcyegzg-default-0                  Created                 Created container resolve-coredns-ip
2024-11-05T22:50:21Z   Normal    agent-env-krcyegzg-default-0                  Pulled                  Container image "toolbelt/dig:2024-09-23" already present on machine
2024-11-05T22:50:21Z   Normal    agent-env-krcyegzg-default-0                  Started                 Started container resolve-coredns-ip
2024-11-05T22:50:22Z   Normal    agent-env-krcyegzg-default-0                  Created                 Created container default
2024-11-05T22:50:22Z   Normal    agent-env-krcyegzg-default-0                  Pulled                  Container image "redacted" already present on machine
2024-11-05T22:50:22Z   Normal    agent-env-krcyegzg-default-0                  Started                 Started container default
2024-11-05T22:56:46Z   Warning   agent-env-krcyegzg-default-0                  NodeNotReady            Node is not ready
2024-11-05T23:01:51Z   Normal    agent-env-krcyegzg-default                    SuccessfulCreate        create Pod agent-env-krcyegzg-default-0 in StatefulSet agent-env-krcyegzg-default successful
2024-11-05T23:01:51Z   Normal    agent-env-krcyegzg-default-0                  TaintManagerEviction    Marking for deletion Pod agent/agent-env-krcyegzg-default-0
2024-11-05T23:01:51Z   Normal    agent-env-krcyegzg-default-0                  TaintManagerEviction    Cancelling deletion of Pod agent/agent-env-krcyegzg-default-0
2024-11-05T23:02:22Z   Warning   agent-env-krcyegzg-default-0                  FailedScheduling        0/46 nodes are available: 2 node(s) had untolerated taint {CriticalAddonsOnly: true}, 2 node(s) had untolerated taint {aisi.gov.uk/dev: true}, 2 node(s) had untolerated taint {aisi.gov.uk/devpods: true}, 36 Insufficient memory, 4 node(s) had untolerated taint {node.kubernetes.io/unreachable: }. preemption: 0/46 nodes are available: 10 Preemption is not helpful for scheduling, 36 No preemption victims found for incoming pod.
2024-11-05T23:02:27Z   Normal    agent-env-krcyegzg-default-0                  Scheduled               Successfully assigned agent/agent-env-krcyegzg-default-0 to ip-192-168-104-100.eu-west-2.compute.internal
2024-11-05T23:02:28Z   Normal    agent-env-krcyegzg-default-0                  Created                 Created container resolve-coredns-ip
2024-11-05T23:02:28Z   Normal    agent-env-krcyegzg-default-0                  Started                 Started container resolve-coredns-ip
2024-11-05T23:02:28Z   Normal    agent-env-krcyegzg-default-0                  Pulled                  Container image "toolbelt/dig:2024-09-23" already present on machine
2024-11-05T23:02:29Z   Normal    agent-env-krcyegzg-default-0                  Pulled                  Container image "redacted" already present on machine
2024-11-05T23:02:30Z   Normal    agent-env-krcyegzg-default-0                  Started                 Started container default
2024-11-05T23:02:30Z   Normal    agent-env-krcyegzg-default-0                  Created                 Created container default
2024-11-05T23:39:55Z   Normal    agent-env-krcyegzg-default-0                  Killing                 Stopping container default

From a large eval set which had max_samples larger than the cluster could accommodate.

Metadata

Assignees

No one assigned

    Labels

    3rd party errorsErrors observed from 3rd party code such as websocket or SSL errors

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions