Skip to content

Web Serving and Remote Procedure Calls at 50x lower latency and 70x higher bandwidth than FastAPI, implementing JSON-RPC & REST over io_uring ☎️

License

Notifications You must be signed in to change notification settings

unum-cloud/ucall

Repository files navigation

Uninterrupted JSON RPC

Understandable Remote Procedure Calls
100x Faster than FastAPI


Discord     LinkedIn     Twitter     Blog     GitHub


Most modern networking is built either on slow and ambiguous REST APIs or unnecessarily complex gRPC. FastAPI, for example, looks very easy to use:

from fastapi import FastAPI

app = FastAPI()

@app.get('/sum')
def sum(a: int, b: int):
    return a + b

It takes over a millisecond to handle such a call on the same machine. In that time, light could have traveled 300 km through optics to the neighboring city or country, in my case.


To make networking faster, one needs just 2 components:

  1. efficient serialization format,
  2. an I/O layer without interrupts (hence the name).

Today, libraries like simdjson can parse JSON documents faster than gRPC will unpack binary ProtoBuf. Moreover, with io_uring, we can avoid system calls and interrupts on the hot path and still use the TCP/IP stack for maximum compatibility. By now, you believe that one can be faster than gRPC, but would that sacrifice usability? We don't think so.

from ujrpc import Server

serve = Server()

@serve
def sum(a: int, b: int):
    return a + b

This tiny solution already works for C, C++, and Python. It is even easier to use than FastAPI but is 100x faster. Moreover, it supports tensor-like types common in Machine Learning and useful for batch processing:

import numpy as np
from ujrpc import Server

serve = Server()

@serve
def sum_arrays(a: np.array, b: np.array):
    return a + b

We are inviting others to contribute bindings to other languages as well.

Benchmarks

All benchmarks were conducted on AWS on general purpose instances with Ubuntu 22.10 AMI, as it is the first major AMI to come with Linux Kernel 5.19, featuring much wider io_uring support for networking operations.

Setup 🔁 1 client on m6i.metal 32 clients on m6i.metal
Fast API over REST 1'002 rps @ 998 μs 3'553 rps @ 8'988 μs
Fast API over WebSocket 12'312 rps @ 81 μs
gRPC
UJRPC over TCP, reset 90 μs
UJRPC over TCP, reuse 25 μs

In every cell we report the average number of Requests Per Second, as well as the average request latency as measured on the client side. μ stands for micro, μs subsequently means microseconds.

Lets start a cluster of small clients and attack some free-tier AWS services, measuring the number of operations they can handle.

Setup 🔁 t2.micro t4g.small
Fast API over REST
Fast API over WebSocket
gRPC
UJRPC over TCP, reset
UJRPC over TCP, reuse

Reproducing Results

FastAPI

pip install uvicorn fastapi websocket-client requests tqdm fire
cd examples && uvicorn sum.fastapi_server:app --log-level critical &
cd ..
python examples/bench.py "sum.fastapi_client.ClientREST" --progress
python examples/bench.py "sum.fastapi_client.ClientWebSocket" --progress
kill %%

Want to dispatch more clients and aggregate statistics?

python examples/bench.py "sum.fastapi_client.ClientREST" --threads 8
python examples/bench.py "sum.fastapi_client.ClientWebSocket" --threads 8

UJRPC

UJRPC can produce both a POSIX compliant old-school server, and a modern io_uring-based version for Linux kernel 5.19 and newer. You would either run ujrpc_example_sum_posix or ujrpc_example_sum_uring.

sudo apt-get install cmake g++ build-essential
cmake -DCMAKE_BUILD_TYPE=Release -B ./build_release  && make -C ./build_release
./build_release/build/bin/ujrpc_example_sum_posix &
./build_release/build/bin/ujrpc_example_sum_uring &
python examples/bench.py "sum.jsonrpc_client.ClientTCP" --progress
python examples/bench.py "sum.jsonrpc_client.ClientHTTP" --progress
python examples/bench.py "sum.jsonrpc_client.ClientHTTPBatches" --progress
kill %%

Want to dispatch more clients and aggregate statistics?

python examples/bench.py "sum.jsonrpc_client.ClientTCP" --threads 32
python examples/bench.py "sum.jsonrpc_client.ClientHTTP" --threads 32
python examples/bench.py "sum.jsonrpc_client.ClientHTTPBatches" --threads 32

A lot has been said about the speed of Python code or the lack of. To get more accurate numbers for mean request latency, you can use the GoLang version:

go run ./examples/sum/ujrpc_client.go

Or push it even further dispatching dozens of processes with GNU parallel utility:

sudo apt install parallel
parallel go run ./examples/sum/ujrpc_client.go run ::: {1..32}

gRPC Results

pip install grpcio grpcio-tools
python ./sum/grpc_server.py &
python examples/bench.py "sum.grpc_client.gRPCClient" --progress
python examples/bench.py "sum.grpc_client.gRPCClient" --threads 32
kill %%

Why JSON-RPC?

  • Transport independent: UDP, TCP, bring what you want.
  • Application layer is optional: use HTTP or not.
  • Unlike REST APIs, there is just one way to pass arguments.

Roadmap

  • Batch requests
  • JSON-RPC over raw TCP sockets
  • JSON-RPC over TCP with HTTP
  • Concurrent sessions
  • HTTPS Support
  • Complementing JSON with Amazon Ion
  • Custom UDP-based JSON-RPC like protocol
  • AF_XDP on Linux