a curated list of awesome Lakehouse frameworks, applications, etc
- Apache Iceberg [Java] - a high-performance format for huge analytic tables, bringing the reliability and simplicity of SQL tables to big data.
- Apache Hudi [Java] - a transactional data lake platform that brings database and data warehouse capabilities to the data lake.
- Apache Paimon (incubating) [Java] - a streaming data lake platform with high-speed data ingestion, changelog tracking and efficient real-time analytics.
- Apache XTable (incubating) [Java] - a cross-table converter for table formats that facilitates omni-directional interoperability across data processing systems and query engines.
- Delta [Scala] - an open-source storage framework that enables building a Lakehouse architecture with various compute engines and languages.
- Apache Amoro (incubating) [Java] - a management system built on open data lake formats, bringing pluggable and self-managed features for Lakehouse.
- GeoLake [Java] - Universal solution for geospatial data tailored to data lakehouse systems.
- LakeSoul [Rust] - a cloud-native Lakehouse framework that supports scalable metadata management, ACID transactions, efficient and flexible upsert operation, schema evolution, and unified streaming & batch processing.
- Lakehouse Engine [Python] - a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
- Smart Data Lake [Scala] - a data lake automation framework that makes loading and transforming data a breeze.
- Apache Gravitino [Java] - a high-performance, geo-distributed, and federated metadata lake.
- DeltaCAT [Python] - a Pythonic Data Catalog powered by Ray.
- lakeFS [Go] - data version control for data lake.
- Metacat [Java] - a unified metadata exploration API service.
- Nessie [Java] - a Transactional Catalog for Data Lakes with Git-like semantics.
- OpenHouse [Java] - an open source control plane designed for efficient management of tables within open data lakehouse deployments.
- Polaris [Java] - The interoperable, open source catalog for Apache Iceberg
- UnityCatalog [Java] - an open and interoperable catalog for data and AI
- Space [Python] - Unified storage framework for the entire machine learning lifecycle.