Flow is a PHP-based, strongly typed ETL (Extract Transform Load), asynchronous data processing library with constant memory consumption.
- low and constant memory consumption
- asynchronous data processing
- reading from any data source
- writing to any data source
- rich collection of data transformation functions
- direct access to remote filesystems
- partitioning
- grouping & aggregating
- remote file processing
- joins
- sorting
- displaying datasets as ASCII table
- validation against the schema
- window functions
- caching
This package is a monorepo. Please check the below packages and select only those that you are going to use, this will reduce the number of unnecessary dependencies in your project (less maintenance).
- ETL
- Adapters
- Libraries
For example, if you want to work with JSON/CSV files here are the dependencies you will need to install:
composer require flow-php/etl:^0.1 flow-php/etl-adapter-csv:^0.1 flow-php/etl-adapter-json:^0.1
In order to understand how Flow works, please read documentation
- DataFrame - Lazy data processing frame.
- Rows - Immutable collection of
Row
objects. - Row - Immutable, strongly typed collection of
Entry
objects. - Entry - Immutable, strongly typed object representing a cell in a row.
- Extractor (Reader) - Memory safe, Data Source returning \Generator, yielding
Rows
to thePipeline
- Transformer - Data transformer receiving and returning
Rows
(in most cases transformer), one instance ofRows
at once. - Loader (Writer) - Memory safe representation of Data Sink, the responsibility of Loader is to write
Rows
into destination storage, one at time. - Pipeline - Interface representing ETL process, each received
Rows
instanced is passed through allPipes
, also responsible for error handling. - Pipe - Loader of Transformer instance existing in the
Pipes
collection.
Flow PHP is sponsored by:
- Blackfire - the best PHP profiling and monitoring tool!