Skip to content
/ dbsp Public
forked from feldera/feldera

Streaming and Incremental Computation Framework

License

Notifications You must be signed in to change notification settings

gz/dbsp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Feldera Continuous Analytics Platform

License: MIT CI workflow

nightly

The Feldera Continuous Analytics Platform, or Feldera Platform in short, is a fast computational engine and associated components for continuous analytics over data in motion. Feldera Platform allows users to configure data pipelines as standing SQL programs (DDLs) that are continuously evaluated as new data arrives from various sources. What makes Feldera's engine unique is its ability to evaluate arbitrary SQL programs incrementally, making it more expressive and performant than existing alternatives like streaming engines.

With the Feldera Platform, software engineers and data scientists configuring data pipelines are not exposed to to the complexities of querying changing data, an otherwise notoriously hard problem. Instead, they can express their computations as standing queries and have the Feldera Platform evaluate these queries incrementally, correctly and efficiently.

To this end we set the following high-level objectives:

  1. Full SQL support and more. Our goal is to support the complete SQL syntax and semantics, including joins and aggregates, correlated subqueries, window functions, complex data types, time series operators, UDFs, and recursive queries.

  2. Scalability in multiple dimensions. The platform scales with the number and complexity of queries, input data rate and the amount of state the system maintains in order to process the queries.

  3. Performance out of the box. The user should be able to focus on the business logic of their application, leaving it to the system to evaluate this logic efficiently.

Architecture

With Feldera Platform, users create data pipelines out of SQL programs and data connectors. An SQL program comprises tables and views. Connectors feed data to input tables in a program or receive outputs computed by views. Example connectors currently supported are Kafka, Redpanda and an HTTP API to push/pull directly to and from tables/views. We are working on more connectors such as ones for database CDC streams. Let us know of any connectors you would like us to develop.

Feldera Platform fundamentally operates on changes to data, i.e., inserts and deletes to tables. This model covers all kinds of data in-motion use cases, like insert-only streams of event, log, HTTP and timeseries data, as well as changes to traditional databases extracted via CDC streams.

The following diagram shows Feldera Platform's architecture.

Feldera Platform Architecture

What is in this repository?

This repository comprises all the buildings blocks to run continuous analytics pipelines using Feldera Platform.

  • web UI: a web interface for writing SQL, setting up connectors, and managing pipelines.
  • pipeline-manager: serves the web UI and is the REST API server for building and managing data pipelines.
  • dbsp: the core engine that allows us to evaluate arbitrary queries incrementally.
  • SQL compiler: translates SQL programs into DBSP programs.
  • connectors: to stream data in and out of Feldera Platform pipelines.

Quick start

First, make sure you have Docker Compose installed.

Next, run the following command to download a Docker Compose file, and use it to bring up a Feldera Platform deployment suitable for demos, development and testing:

curl https://raw.githubusercontent.com/feldera/feldera/main/deploy/docker-compose.yml | docker compose -f - --profile demo up

It can take some time for the container images to be downloaded. About ten seconds after that, the DBSP web interface will become available. Visit http://localhost:8080 on your browser to bring it up. We suggest going through our demo next.

Our Getting Started guide has more detailed instructions on running the demo.

Documentation

To learn more about Feldera Platform, we recommend going through the documentation.

Contributing

Most of the software in this repository is governed by an open-source license. We welcome contributions. Here are some guidelines.

Theory

Feldera Platform achieves its objectives by building on a solid mathematical foundation. The formal model that underpins our system, called DBSP, is described in the accompanying paper:

The model provides two things:

  1. Semantics. DBSP defines a formal language of streaming operators and queries built out of these operators, and precisely specifies how these queries must transform input streams to output streams.

  2. Algorithm. DBSP also gives an algorithm that takes an arbitrary query and generates an incremental dataflow program that implements this query correctly (in accordance with its formal semantics) and efficiently. Efficiency here means, in a nutshell, that the cost of processing a set of input events is proportional to the size of the input rather than the entire state of the database.

About

Streaming and Incremental Computation Framework

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 55.8%
  • Java 23.4%
  • TeX 9.8%
  • TypeScript 6.4%
  • Python 3.6%
  • Shell 0.5%
  • Other 0.5%