Marquez is a fundamental core service for collection, aggregation, and visualization of all metadata within a data ecosystem. It maintains the provenance of how datasets are consumed and produced, provides visibility into job runtime and frequency of dataset access, centralization of dataset lifecycle management, and much more.
This project is under active development at WeWork and Stitch Fix (in collaboration with many others organizations).
The Marquez design is being actively updated and is open for comments.
- Java 8 or above
- PostgreSQL database
- Gradle 4.9 or above
To build the entire project run:
$ ./gradlew shadowJar
The executable can be found under build/libs/
Note: When creating your database, we recommend calling it marquez
.
To run Marquez, you will have to define config.yml
. The configuration file is used to specify your database connection. Please copy and edit config.example.yml
:
$ cp config.example.yml config.yml
Edit the following parameters in the config.yml you created based on your environment:
DB name (need to be created beforehand): POSTGRESQL_DB_NAME
DB user: POSTGRESQL_USER
DB password: POSTGRESQL_PASSWORD
Then run the database migration:
$ ./gradlew run --args 'db migrate config.yml'
Running the Application
$ ./gradlew run --args 'server config.yml'
Then browse to the admin interface: http://localhost:8081