Description
What happened + What you expected to happen
We observed in production that upon draining of the Proxy on an empty node, these are being shutdown after just 5s as compared to default configuration of 30s:
{"levelname": "INFO", "asctime": "2023-12-06 16:20:34,778", "component_name": "controller", "component_id": "1672", "message": "proxy_state.py:546 - Start to drain the proxy actor on node 3f12472660c6c94f3649dc0084dabca336889465b71ef6b47ab3f199"}
{"levelname": "INFO", "asctime": "2023-12-06 16:20:39,931", "component_name": "controller", "component_id": "1672", "message": "proxy_state.py:794 - Removing drained proxy on node '3f12472660c6c94f3649dc0084dabca336889465b71ef6b47ab3f199'."}
This results in requests being dropped before our Load Balancer gets a chance to evict the node from its routing set.
Versions / Dependencies
2.8
Reproduction script
Kill all replicas on the worker node, wait for proxies to start draining and eventually killed
Issue Severity
Medium: It is a significant difficulty but I can work around it.