Awesome list of DataOps open source software, online services, courses and use cases
- Apache Airlow - Airflow is a platform created by the community to programmatically author, schedule and monitor workflows.
- Apache Oozie - Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
- Dagster - A Python library for building data applications: ETL, ML, Data Pipelines, and more.
- DBT Cmd tool - the T in ELT. Organize, cleanse, denormalize, filter, rename, and pre-aggregate the raw data in your warehouse so that it's ready for analysis.
- Reflow - A language and runtime for distributed, incremental data processing in the cloud
- Apache Kafka - a distributed streaming platform.
- Apache Nifi - an easy to use, powerful, and reliable system to process and distribute data.
- Squirrel - a Python library for large-scale data loading, transforming and sharing.
- Astronomer - spin up and scale Apache Airflow clusters
- Databand - Databand tracks your pipeline execution metadata, so you can evaluate changes in runtimes, code, data, and critical business KPIs.
- DataKitchen - end-to-end DataOps platform automates and coordinates all the people, tools, and environments in your entire data analytics organization – everything from orchestration, testing, and monitoring to development and deployment.
- Prefect - is a new workflow management system, designed for modern infrastructure and powered by open-source software.
- Saagie - Saagie DataOps Orchestrator integrates the commercial and open source data technologies to accelerate project delivery
- Unravel - helps ops engineers, app developers, and enterprise architects reduce the complexity of delivering reliable application performance – providing unified visibility and operational intelligence to optimize your entire ecosystem
- AWS Glue - is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores.
- Azure Data Factory - a hybrid data integration service, simplified ETL operations
- Google Cloud Dataflow - unified stream and batch data processing that's serverless, fast, and cost-effective.
- ETLWorks - a cloud-first, any-to-any data integration platform
- Alation Data Catalog - a data catalog designed for human collaboration
- Colibra Data Catalog - empowers business users to quickly discover and understand data that matters
- SQL Data catalog - tool to discover and classify sensitive data for MS SQL Server
- RightData - is a data testing, reconciliation, validation suite that allows stakeholders in identifying issues related to data consistency, quality, completeness, and gaps.