-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send large arrow binaries as chunks, add client-level Websocket heartbeat #1209
Conversation
6156e3e
to
9f58bef
Compare
msg.is_transferable = true; | ||
msg.arrow_length = arrow.byteLength; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I nominate we combine this property with is_transferable
, since these are now always coincident, and let the presence of arrow_length
property imply is_transferable
.
Also - maybe "buffer_length" is more conventional term here for arrow_length? We don't really opine on the nature of a msg
at this level of the API anywhere else.
req.ws.send(transferable[0]); | ||
setTimeout(() => { | ||
this._post_chunked(req, arrow, 0, this.chunk_size, arrow.byteLength); | ||
}, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to defer this first step? We have the ArrayBuffer
in hand already, and it seems deferring here may simply introduce another event-loop-flush delay to arrow-containing messages that are only a single chunk in length.
Defaults to 200MB. | ||
chunk_size (:obj:`int`): Binary messages above ``chunk_threshold`` | ||
will be passed in chunks of this length (in bytes). Defaults | ||
to 20MB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I nominate we combine chunk_threshold
and chunk_size
into a single variable. Under these default values even, it is possible/likely to get scenarios with average chunk size way above the chunk_size
param by targetting e.g. 190mb requests, not large enough to trigger the threshold but 9.5x over the intended loop chunk size.
Ideally, this is a variable the developer does not ever need to touch, and may even be removed at some point if we can automate it. Something like this:
chunk_size (:obj:`int`): Binary messages will not exceed this length
(in bytes); payloads above this limit will be chunked
across multiple messages. Defaults to `20971520` (20MB), and
be disabled entirely with `None`.
…remove websocket_ping_interval
…h pending_binary etc
9f58bef
to
3ac2d63
Compare
c05d686
to
ad481b9
Compare
Combined Fix & test also for an issue with asynchronous logic, namely that once a client is in "pending_binary" state, it cannot process other messages. This is only sometimes an issue, as clients tend to serialize themselves by Before: a call to After: |
This PR implements chunked sending/chunked loading of binaries in Perspective's websocket implementations in both Python and Javascript. Apache Arrow binaries above a certain size will be sent to the Perspective client in chunks, which prevents the Websocket from locking on a particularly large binary and closing due to a timeout, as once that binary is on the wire all heartbeat messages/other messages are blocked from being sent/received.
The default chunk threshold is configured to be 200MB, and arrows will be transported in 50MB chunks. In Python, these settings can be altered in the
PerspectiveManager
constructor by initializing a newPerspectiveManager
withchunk_threshold
andchunk_size
, which are defined in bytes:In
NodeJS
, these settings can be altered on theWebsocketManager
:Additionally, this PR implements a client-level heartbeat message (ping/pong) between Perspective client and server, which should aid in debugging as heartbeat messages now show up in the browser debug console. In Tornado, servers previously set up with
websocket_ping_interval
should remove theirwebsocket_ping_interval
setting and depend on the client-level heartbeats to keep the Perspective websocket connection alive.Changelog
PerspectiveTornadoServer
in Python andWebSocketManager
in JSWebSocketClient
in JS andPerspectiveTornadoClient
in Python