Skip to content
/ flow Public
forked from flow-php/flow

Flow PHP - strongly typed data processing framework

License

Notifications You must be signed in to change notification settings

FunkyOz/flow

 
 

Repository files navigation

img

Flow is a PHP-based, strongly typed ETL (Extract Transform Load), asynchronous data processing library with constant memory consumption.

Latest Stable Version Latest Unstable Version License Test Suite

Supported PHP versions: PHP 8.1 PHP 8.2

Features

  • low and constant memory consumption
  • asynchronous data processing
  • reading from any data source
  • writing to any data source
  • rich collection of data transformation functions
  • direct access to remote filesystems
  • partitioning
  • grouping & aggregating
  • remote file processing
  • joins
  • sorting
  • displaying datasets as ASCII table
  • validation against the schema
  • window functions
  • caching

📈Project Roadmap

Installation

This package is a monorepo. Please check the below packages and select only those that you are going to use, this will reduce the number of unnecessary dependencies in your project (less maintenance).

For example, if you want to work with JSON/CSV files here are the dependencies you will need to install:

composer require flow-php/etl:^0.1 flow-php/etl-adapter-csv:^0.1 flow-php/etl-adapter-json:^0.1

Usage

In order to understand how Flow works, please read documentation

Building blocks

  • DataFrame - Lazy data processing frame.
  • Rows - Immutable collection of Row objects.
  • Row - Immutable, strongly typed collection of Entry objects.
  • Entry - Immutable, strongly typed object representing a cell in a row.
  • Extractor (Reader) - Memory safe, Data Source returning \Generator, yielding Rows to the Pipeline
  • Transformer - Data transformer receiving and returning Rows (in most cases transformer), one instance of Rows at once.
  • Loader (Writer) - Memory safe representation of Data Sink, the responsibility of Loader is to write Rows into destination storage, one at time.
  • Pipeline - Interface representing ETL process, each received Rows instanced is passed through all Pipes, also responsible for error handling.
  • Pipe - Loader of Transformer instance existing in the Pipes collection.

Asynchronous Processing

GitHub Stars

Star History Chart

Sponsors

Flow PHP is sponsored by:

  • Blackfire - the best PHP profiling and monitoring tool!

About

Flow PHP - strongly typed data processing framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PHP 96.0%
  • Thrift 2.2%
  • Python 1.6%
  • HTML 0.2%