Skip to content

[Serve] Draining Proxy is shutdown prematurely  #41726

@alexeykudinkin

Description

What happened + What you expected to happen

We observed in production that upon draining of the Proxy on an empty node, these are being shutdown after just 5s as compared to default configuration of 30s:

{"levelname": "INFO", "asctime": "2023-12-06 16:20:34,778", "component_name": "controller", "component_id": "1672", "message": "proxy_state.py:546 - Start to drain the proxy actor on node 3f12472660c6c94f3649dc0084dabca336889465b71ef6b47ab3f199"}
{"levelname": "INFO", "asctime": "2023-12-06 16:20:39,931", "component_name": "controller", "component_id": "1672", "message": "proxy_state.py:794 - Removing drained proxy on node '3f12472660c6c94f3649dc0084dabca336889465b71ef6b47ab3f199'."}

This results in requests being dropped before our Load Balancer gets a chance to evict the node from its routing set.

Versions / Dependencies

2.8

Reproduction script

Kill all replicas on the worker node, wait for proxies to start draining and eventually killed

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Metadata

Labels

P0Issues that should be fixed in short orderbugSomething that is supposed to be working; but isn'tray 2.9Issues targeting Ray 2.9 release (~Q4 CY2023)release-blockerP0 Issue that blocks the release

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions