The Feldera Continuous Analytics Platform

The Feldera Continuous Analytics Platform, or Feldera Platform in short, is a fast computational engine and associated components for continuous analytics over data in motion. Feldera Platform allows users to configure data pipelines as standing SQL programs (DDLs) that are continuously evaluated as new data arrives from various sources. What makes Feldera's engine unique is its ability to evaluate arbitrary SQL programs incrementally, making it more expressive and performant than existing alternatives like streaming engines.

With the Feldera Platform, software engineers and data scientists configuring data pipelines are not exposed to to the complexities of querying changing data, an otherwise notoriously hard problem. Instead, they can express their computations as standing queries and have the Feldera Platform evaluate these queries incrementally, correctly and efficiently.

To this end we set the following high-level objectives:

Full SQL support and more. Our goal is to support the complete SQL syntax and semantics, including joins and aggregates, correlated subqueries, window functions, complex data types, time series operators, UDFs, and recursive queries.
Scalability in multiple dimensions. The platform scales with the number and complexity of queries, input data rate and the amount of state the system maintains in order to process the queries.
Performance out of the box. The user should be able to focus on the business logic of their application, leaving it to the system to evaluate this logic efficiently.
Datasets larger than RAM. Our platform is designed to handle datasets that exceed the available RAM, ensuring efficient retrieval from NVMe.

Architecture

With Feldera Platform, users create data pipelines out of SQL programs and data connectors. An SQL program comprises tables and views. Connectors feed data to input tables in a program or receive outputs computed by views. Example connectors currently supported are Kafka, Redpanda and an HTTP API to push/pull directly to and from tables/views. We are working on more connectors such as ones for database CDC streams. Let us know of any connectors you would like us to develop.

Feldera Platform fundamentally operates on changes to data, i.e., inserts and deletes to tables. This model covers all kinds of data in-motion use cases, like insert-only streams of event, log, HTTP and timeseries data, as well as changes to traditional databases extracted via CDC streams.

The following diagram shows Feldera Platform's architecture.

What is in this repository?

This repository comprises all the buildings blocks to run continuous analytics pipelines using Feldera Platform.

web UI: a web interface for writing SQL, setting up connectors, and managing pipelines.
pipeline-manager: serves the web UI and is the REST API server for building and managing data pipelines.
dbsp: the core engine that allows us to evaluate arbitrary queries incrementally.
SQL compiler: translates SQL programs into DBSP programs.
connectors: to stream data in and out of Feldera Platform pipelines.

Quick start with Docker

First, make sure you have Docker Compose installed.

Next, run the following command to download a Docker Compose file, and use it to bring up a Feldera Platform deployment suitable for demos, development and testing:

curl -L https://github.com/feldera/feldera/releases/latest/download/docker-compose.yml | \
docker compose -f - --profile demo up

It can take some time for the container images to be downloaded. About ten seconds after that, the Feldera web console will become available. Visit http://localhost:8080 on your browser to bring it up. We suggest going through our demo next.

Our Getting Started guide has more detailed instructions on running the demo.

Running Feldera from sources

To run Feldera from sources, first install all the required dependencies. This includes the Rust toolchain (at least 1.75), Java (at least JDK 19), Maven and Typescript.

After that, the first step is to build the SQL compiler:

cd sql-to-dbsp-compiler
mvn package -DskipTests

Next, from the repository root, run the pipeline-manager:

cargo run --bin=pipeline-manager --features pg-embed

As with the Docker instructions above, you can now visit http://localhost:8080 on your browser to see the Feldera WebConsole.

Documentation

To learn more about Feldera Platform, we recommend going through the documentation.

Contributing

Most of the software in this repository is governed by an open-source license. We welcome contributions. Here are some guidelines.

Theory

Feldera Platform achieves its objectives by building on a solid mathematical foundation. The formal model that underpins our system, called DBSP, is described in the accompanying paper:

Budiu, Chajed, McSherry, Ryzhyk, Tannen. DBSP: Automatic Incremental View Maintenance for Rich Query Languages, Conference on Very Large Databases, August 2023, Vancouver, Canada
Here is a presentation about DBSP at the 2023 Apache Calcite Meetup.

The model provides two things:

Semantics. DBSP defines a formal language of streaming operators and queries built out of these operators, and precisely specifies how these queries must transform input streams to output streams.
Algorithm. DBSP also gives an algorithm that takes an arbitrary query and generates an incremental dataflow program that implements this query correctly (in accordance with its formal semantics) and efficiently. Efficiency here means, in a nutshell, that the cost of processing a set of input events is proportional to the size of the input rather than the entire state of the database.

Name		Name	Last commit message	Last commit date
Latest commit History 2,813 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
benchmark		benchmark
crates		crates
demo		demo
deploy		deploy
docs		docs
papers		papers
python		python
scripts		scripts
sql-to-dbsp-compiler		sql-to-dbsp-compiler
web-console-sveltekit		web-console-sveltekit
web-console		web-console
.arg		.arg
.dockerignore		.dockerignore
.earthlyignore		.earthlyignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Earthfile		Earthfile
LICENSE		LICENSE
README.md		README.md
architecture.svg		architecture.svg
bun.lockb		bun.lockb
codecov.yml		codecov.yml
import-sorter.json		import-sorter.json
openapi.json		openapi.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Feldera Continuous Analytics Platform

Architecture

What is in this repository?

Quick start with Docker

Running Feldera from sources

Documentation

Contributing

Theory

About

Releases

Packages

Languages

License

kshuta/feldera

Folders and files

Latest commit

History

Repository files navigation

The Feldera Continuous Analytics Platform

Architecture

What is in this repository?

Quick start with Docker

Running Feldera from sources

Documentation

Contributing

Theory

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages