Skip to content

Commit

Permalink
feat: generated protobuf classes (#1)
Browse files Browse the repository at this point in the history
  • Loading branch information
danepitkin authored Apr 24, 2023
1 parent ebc766d commit 70a3cb8
Show file tree
Hide file tree
Showing 26 changed files with 888 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
src/substrait/gen/** linguist-generated=true
18 changes: 18 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,21 @@ dmypy.json

# Pyre type checker
.pyre/

# setuptools_scm dynamic versioning
src/substrait/_version.py

# Buf working directory
./buf_work_dir

# Editor files
.idea
.vscode

# OS generated files
.directory
.gdb_history
.DS_Store

# Python interface files
*.pyi
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "third_party/substrait"]
path = third_party/substrait
url = https://github.com/substrait-io/substrait
49 changes: 49 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Getting Started
## Get the repo
Fork and clone the repo.
```
git clone --recursive https://github.com/<your-fork>/substrait-python.git
cd substrait-python
```
## Update the substrait submodule locally
This might be necessary if you are updating an existing checkout.
```
git submodule sync --recursive
git submodule update --init --recursive
```
## Upgrade the substrait submodule
You will need to regenerate protobuf classes if you do this (run `gen_proto.sh`).
```
cd third_party/substrait
git checkout <version>
cd -
git commit . -m "Use submodule <version>"
```


# Setting up your environment
## Conda env
Create a conda environment with developer dependencies.
```
conda env create -f environment.yml
conda activate substrait-python-env
```

# Build
## Python package
Editable installation.
```
pip install -e .
```

## Generate protocol buffers
Generate the protobuf files manually. Requires protobuf `v3.20.1`.
```
./gen_proto.sh
```

# Test
Run tests in the project's root dir.
```
pytest
```
165 changes: 165 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# Substrait

A Python package for [Substrait](https://substrait.io), the cross-language specification for data compute operations.

## Goals
This project aims to provide a Python interface for the Substrait specification. It will allow users to construct and manipulate a Substrait Plan from Python for evaluation by a Substrait consumer, such as DataFusion or DuckDB.

## Non-goals
This project is not an execution engine for Substrait Plans.

## Status
This is an experimental package that is still under development.

# Example
At the moment, this project contains only generated Python classes for the Substrait protobuf messages. Let's use an existing Substrait producer, [Ibis](https://ibis-project.org), to provide an example using Python Substrait as the consumer.
## Produce a Substrait Plan with Ibis
```
In [1]: import ibis
In [2]: movie_ratings = ibis.table(
...: [
...: ("tconst", "str"),
...: ("averageRating", "str"),
...: ("numVotes", "str"),
...: ],
...: name="ratings",
...: )
...:
In [3]: query = movie_ratings.select(
...: movie_ratings.tconst,
...: avg_rating=movie_ratings.averageRating.cast("float"),
...: num_votes=movie_ratings.numVotes.cast("int"),
...: )
In [4]: from ibis_substrait.compiler.core import SubstraitCompiler
In [5]: compiler = SubstraitCompiler()
In [6]: protobuf_msg = compiler.compile(query).SerializeToString()
In [7]: type(protobuf_msg)
Out[7]: bytes
```
## Consume the Substrait Plan using Python Substrait
```
In [8]: import substrait
In [9]: from substrait.gen.proto.plan_pb2 import Plan
In [10]: my_plan = Plan()
In [11]: my_plan.ParseFromString(protobuf_msg)
Out[11]: 186
In [12]: print(my_plan)
relations {
root {
input {
project {
common {
emit {
output_mapping: 3
output_mapping: 4
output_mapping: 5
}
}
input {
read {
common {
direct {
}
}
base_schema {
names: "tconst"
names: "averageRating"
names: "numVotes"
struct {
types {
string {
nullability: NULLABILITY_NULLABLE
}
}
types {
string {
nullability: NULLABILITY_NULLABLE
}
}
types {
string {
nullability: NULLABILITY_NULLABLE
}
}
nullability: NULLABILITY_REQUIRED
}
}
named_table {
names: "ratings"
}
}
}
expressions {
selection {
direct_reference {
struct_field {
}
}
root_reference {
}
}
}
expressions {
cast {
type {
fp64 {
nullability: NULLABILITY_NULLABLE
}
}
input {
selection {
direct_reference {
struct_field {
field: 1
}
}
root_reference {
}
}
}
failure_behavior: FAILURE_BEHAVIOR_THROW_EXCEPTION
}
}
expressions {
cast {
type {
i64 {
nullability: NULLABILITY_NULLABLE
}
}
input {
selection {
direct_reference {
struct_field {
field: 2
}
}
root_reference {
}
}
}
failure_behavior: FAILURE_BEHAVIOR_THROW_EXCEPTION
}
}
}
}
names: "tconst"
names: "avg_rating"
names: "num_votes"
}
}
version {
minor_number: 24
producer: "ibis-substrait"
}
```
4 changes: 4 additions & 0 deletions buf.gen.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
plugins:
- name: python
out: src/substrait/gen
version: v1
3 changes: 3 additions & 0 deletions buf.work.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
version: v1
directories:
- buf_work_dir
7 changes: 7 additions & 0 deletions buf.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
version: v1
breaking:
use:
- FILE
lint:
use:
- DEFAULT
12 changes: 12 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
name: substrait-python-env
channels:
- conda-forge
dependencies:
- buf
- pip
- protobuf = 3.20.1 # protobuf==3.20 C extensions aren't compatible with 3.19.4
- protoletariat >= 2.0.0
- pytest >= 7.0.0
- python >= 3.8.1
- setuptools >= 61.0.0
- setuptools_scm >= 6.2.0
23 changes: 23 additions & 0 deletions gen_proto.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env bash

set -eou pipefail

namespace=proto
submodule_dir=./third_party/substrait
src_dir="$submodule_dir"/proto
tmp_dir=./buf_work_dir
dest_dir=./src/substrait/gen

# Prefix the protobuf files with a unique configuration to prevent namespace conflicts
# with other substrait packages. Save output to the work dir.
python "$submodule_dir"/tools/proto_prefix.py "$tmp_dir" "$namespace" "$src_dir"

# Remove the old python protobuf files
rm -rf "$dest_dir"

# Generate the new python protobuf files
buf generate
protol --in-place --create-package --python-out "$dest_dir" buf

# Remove the temporary work dir
rm -rf "$tmp_dir"
23 changes: 23 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
[project]
name = "substrait"
description = "A python package for Substrait."
authors = [{name = "Substrait contributors", email = "substrait@googlegroups.com"}]
license = {text = "Apache-2.0"}
readme = "README.md"
requires-python = ">=3.8.1"
dependencies = ["protobuf >= 3.20"]
dynamic = ["version"]

[tool.setuptools_scm]
write_to = "src/substrait/_version.py"

[project.optional-dependencies]
gen_proto = ["protobuf == 3.20.1", "protoletariat >= 2.0.0"]
test = ["pytest >= 7.0.0"]

[tool.pytest.ini_options]
pythonpath = "src"

[build-system]
requires = ["setuptools>=61.0.0", "setuptools_scm[toml]>=6.2.0"]
build-backend = "setuptools.build_meta"
4 changes: 4 additions & 0 deletions src/substrait/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
try:
from ._version import __version__
except ImportError:
pass
Empty file added src/substrait/gen/__init__.py
Empty file.
Empty file.
273 changes: 273 additions & 0 deletions src/substrait/gen/proto/algebra_pb2.py

Large diffs are not rendered by default.

16 changes: 16 additions & 0 deletions src/substrait/gen/proto/capabilities_pb2.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 20 additions & 0 deletions src/substrait/gen/proto/extended_expression_pb2.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Empty file.
25 changes: 25 additions & 0 deletions src/substrait/gen/proto/extensions/extensions_pb2.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 70a3cb8

Please sign in to comment.