Protobuf to Arrow, using Rust
Take a protobuf:
message SearchRequest {
string query = 1;
int32 page_number = 2;
int32 result_per_page = 3;
}
And convert serialized messages directly to pyarrow.RecordBatch
:
from ptars import HandlerPool
messages = [
SearchRequest(
query="protobuf to arrow",
page_number=0,
result_per_page=10,
),
SearchRequest(
query="protobuf to arrow",
page_number=1,
result_per_page=10,
),
]
payloads = [message.SerializeToString() for message in messages]
pool = HandlerPool()
handler = pool.get_for_message(SearchRequest.DESCRIPTOR)
record_batch = handler.list_to_record_batch(payloads)
query | page_number | result_per_page |
---|---|---|
protobuf to arrow | 0 | 10 |
protobuf to arrow | 1 | 10 |
You can also convert a pyarrow.RecordBatch
back to serialized protobuf messages:
array: pa.BinaryArray = handler.record_batch_to_array(record_batch)
messages_back: list[SearchRequest] = [
SearchRequest.FromString(s.as_py()) for s in array
]
Ptars is a rust implementation of protarrow, which is implemented in plain python. It is:
- marginally faster when converting from proto to arrow.
- About 3 times faster when converting from arrow to proto.
----------------------------------------------------------------------------- benchmark 'to_arrow': 2 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_ptars_to_arrow 7.1125 (1.0) 8.2581 (1.0) 7.2747 (1.0) 0.1299 (1.0) 7.2498 (1.0) 0.1243 (1.0) 20;5 137.4636 (1.0) 134 1
test_protarrow_to_arrow 7.1563 (1.01) 20.4630 (2.48) 8.7641 (1.20) 3.8860 (29.92) 7.3423 (1.01) 0.2286 (1.84) 14;16 114.1022 (0.83) 122 1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------ benchmark 'to_proto': 2 tests -------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_ptars_to_proto 6.3732 (1.0) 6.9871 (1.0) 6.6027 (1.0) 0.1234 (1.0) 6.5784 (1.0) 0.1547 (1.0) 47;3 151.4530 (1.0) 150 1
test_protarrow_to_proto 18.8678 (2.96) 31.2787 (4.48) 20.6836 (3.13) 3.8508 (31.19) 19.2683 (2.93) 0.4145 (2.68) 6;6 48.3475 (0.32) 51 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------