A repository how to manage dbt projects and have the pushed in cloud docker repositories.
Start each project using dbt init
.
For consistency create in the project's directory profiles
, and add also add in there profiles.yml
.
$ dbt init dbt_project_3
$ cd dbt_project_3
$ mkdir profiles
$ touch profiles/profiles.yml
Add every project in the Dockerfile
, to make sure the project is included in the docker image.
RUN mkdir dbt_project_3
COPY dbt_project_3 ./dbt_project_3
RUN ["dbt", "deps", "--project-dir", "./dbt_project_3"]
$ docker build -t <IMAGE_NAME>:<IMAGE_TAG> .$ docker run <IMAGE_NAME>:<IMAGE_TAG> dbt run --project-dir ./<project_dir> --profiles-dir ./<project_dir>/profiles
$ python -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ cd <your_project_dir>
$ dbt seed --profiles-dir ./profiles
$ dbt run --profiles-dir ./profiles
In this repo, GitHub actions to push to GCP Artifact registry and AWS ECR are already included.
Checkout .github
workflows, and fill in the required credentials as repository secrets. Also you need to make sure the respective repositories exist in GCP or AWS already.
Required credentials are:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
GCP_CREDENTIALS (as json)
GCP_PROJECT_ID
When running this these dbt images in Airflow KubernetesPodOperator
:
migrate_data = KubernetesPodOperator(
namespace='default',
image='europe-west1-docker.pkg.dev/PROJECT-ID/transformations-repository/dbt-transformations:latest',
cmds=["dbt", "run"],
arguments=[
"--project-dir", "./<project_dir>", "--profiles-dir", "./<project_dir>/profiles"
],
name="dbt_transformations",
task_id="dbt_transformations",
get_logs=True
)