Most modern networking is built either on slow and ambiguous REST APIs or unnecessarily complex gRPC. FastAPI, for example, looks very easy to use:
from fastapi import FastAPI
app = FastAPI()
@app.get('/sum')
def sum(a: int, b: int):
return a + b
It takes over a millisecond to handle such a call on the same machine. In that time, light could have traveled 300 km through optics to the neighboring city or country, in my case.
To make networking faster, one needs just 2 components:
- efficient serialization format,
- an I/O layer without interrupts (hence the name).
Today, libraries like simdjson
can parse JSON documents faster than gRPC will unpack binary ProtoBuf
.
Moreover, with io_uring
, we can avoid system calls and interrupts on the hot path and still use the TCP/IP stack for maximum compatibility.
By now, you believe that one can be faster than gRPC, but would that sacrifice usability?
We don't think so.
from ujrpc import Server
serve = Server()
@serve
def sum(a: int, b: int):
return a + b
This tiny solution already works for C, C++, and Python. It is even easier to use than FastAPI but is 100x faster. Moreover, it supports tensor-like types common in Machine Learning and useful for batch processing:
import numpy as np
from ujrpc import Server
serve = Server()
@serve
def sum_arrays(a: np.array, b: np.array):
return a + b
We are inviting others to contribute bindings to other languages as well.
All benchmarks were conducted on AWS on general purpose instances with Ubuntu 22.10 AMI, as it is the first major AMI to come with Linux Kernel 5.19, featuring much wider io_uring support for networking operations.
Setup | 🔁 | 1 client on m6i.metal | 32 clients on m6i.metal |
---|---|---|---|
Fast API over REST | ❌ | 1'002 rps @ 998 μs | 3'553 rps @ 8'988 μs |
Fast API over WebSocket | ✅ | 12'312 rps @ 81 μs | |
gRPC | ✅ | ||
UJRPC over TCP, reset | ❌ | 90 μs | |
UJRPC over TCP, reuse | ✅ | 25 μs |
In every cell we report the average number of Requests Per Second, as well as the average request latency as measured on the client side. μ stands for micro, μs subsequently means microseconds.
Lets start a cluster of small clients and attack some free-tier AWS services, measuring the number of operations they can handle.
Setup | 🔁 | t2.micro | t4g.small |
---|---|---|---|
Fast API over REST | ❌ | ||
Fast API over WebSocket | ✅ | ||
gRPC | ✅ | ||
UJRPC over TCP, reset | ❌ | ||
UJRPC over TCP, reuse | ✅ |
pip install uvicorn fastapi websocket-client requests tqdm fire
cd examples && uvicorn sum.fastapi_server:app --log-level critical &
cd ..
python examples/bench.py "sum.fastapi_client.ClientREST" --progress
python examples/bench.py "sum.fastapi_client.ClientWebSocket" --progress
kill %%
Want to dispatch more clients and aggregate statistics?
python examples/bench.py "sum.fastapi_client.ClientREST" --threads 8
python examples/bench.py "sum.fastapi_client.ClientWebSocket" --threads 8
UJRPC can produce both a POSIX compliant old-school server, and a modern io_uring
-based version for Linux kernel 5.19 and newer.
You would either run ujrpc_example_sum_posix
or ujrpc_example_sum_uring
.
sudo apt-get install cmake g++ build-essential
cmake -DCMAKE_BUILD_TYPE=Release -B ./build_release && make -C ./build_release
./build_release/build/bin/ujrpc_example_sum_posix &
./build_release/build/bin/ujrpc_example_sum_uring &
python examples/bench.py "sum.jsonrpc_client.ClientTCP" --progress
python examples/bench.py "sum.jsonrpc_client.ClientHTTP" --progress
python examples/bench.py "sum.jsonrpc_client.ClientHTTPBatches" --progress
kill %%
Want to dispatch more clients and aggregate statistics?
python examples/bench.py "sum.jsonrpc_client.ClientTCP" --threads 32
python examples/bench.py "sum.jsonrpc_client.ClientHTTP" --threads 32
python examples/bench.py "sum.jsonrpc_client.ClientHTTPBatches" --threads 32
A lot has been said about the speed of Python code or the lack of.
To get more accurate numbers for mean request latency, you can use the GoLang version:
go run ./examples/sum/ujrpc_client.go
Or push it even further dispatching dozens of processes with GNU parallel
utility:
sudo apt install parallel
parallel go run ./examples/sum/ujrpc_client.go run ::: {1..32}
pip install grpcio grpcio-tools
python ./sum/grpc_server.py &
python examples/bench.py "sum.grpc_client.gRPCClient" --progress
python examples/bench.py "sum.grpc_client.gRPCClient" --threads 32
kill %%
- Transport independent: UDP, TCP, bring what you want.
- Application layer is optional: use HTTP or not.
- Unlike REST APIs, there is just one way to pass arguments.
- Batch requests
- JSON-RPC over raw TCP sockets
- JSON-RPC over TCP with HTTP
- Concurrent sessions
- HTTPS Support
- Complementing JSON with Amazon Ion
- Custom UDP-based JSON-RPC like protocol
- AF_XDP on Linux