Skip to content

0x26res/ptars

Repository files navigation

ptars

Ruff PyPI Version Python Version Github Stars codecov Build Status License Downloads Downloads Code style: black snyk Size

Protobuf to Arrow, using Rust

Example

Take a protobuf:

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}

And convert serialized messages directly to pyarrow.RecordBatch:

from ptars import HandlerPool


messages = [
    SearchRequest(
        query="protobuf to arrow",
        page_number=0,
        result_per_page=10,
    ),
    SearchRequest(
        query="protobuf to arrow",
        page_number=1,
        result_per_page=10,
    ),
]
payloads = [message.SerializeToString() for message in messages]

pool = HandlerPool()
handler = pool.get_for_message(SearchRequest.DESCRIPTOR)
record_batch = handler.list_to_record_batch(payloads)
query page_number result_per_page
protobuf to arrow 0 10
protobuf to arrow 1 10

You can also convert a pyarrow.RecordBatch back to serialized protobuf messages:

array: pa.BinaryArray = handler.record_batch_to_array(record_batch)
messages_back: list[SearchRequest] = [
    SearchRequest.FromString(s.as_py()) for s in array
]

Benchmark against protarrow

Ptars is a rust implementation of protarrow, which is implemented in plain python. It is:

  • marginally faster when converting from proto to arrow.
  • About 3 times faster when converting from arrow to proto.
----------------------------------------------------------------------------- benchmark 'to_arrow': 2 tests -----------------------------------------------------------------------------
Name (time in ms)              Min                Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_ptars_to_arrow         7.1125 (1.0)       8.2581 (1.0)      7.2747 (1.0)      0.1299 (1.0)      7.2498 (1.0)      0.1243 (1.0)          20;5  137.4636 (1.0)         134           1
test_protarrow_to_arrow     7.1563 (1.01)     20.4630 (2.48)     8.7641 (1.20)     3.8860 (29.92)    7.3423 (1.01)     0.2286 (1.84)        14;16  114.1022 (0.83)        122           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------ benchmark 'to_proto': 2 tests -------------------------------------------------------------------------------
Name (time in ms)               Min                Max               Mean            StdDev             Median               IQR            Outliers       OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_ptars_to_proto          6.3732 (1.0)       6.9871 (1.0)       6.6027 (1.0)      0.1234 (1.0)       6.5784 (1.0)      0.1547 (1.0)          47;3  151.4530 (1.0)         150           1
test_protarrow_to_proto     18.8678 (2.96)     31.2787 (4.48)     20.6836 (3.13)     3.8508 (31.19)    19.2683 (2.93)     0.4145 (2.68)          6;6   48.3475 (0.32)         51           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------