This repo shows how to maintain "type-2 Slowly Changing Dimensions" snapshots of a mutable table using DBT. What this is used for is to provide a "look back in time" view of a table which is updated in place (i.e. new data replaces old data).
The data warehouse I'm using for this example is google BigQuery.
# install dependencies #
python -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install dbt-core dbt-bigquery
-- create bigquery dataset --
CREATE SCHEMA `your-gcp-project-id.dbt_snapshot_example`
OPTIONS (
location = 'europe-west2'
)
;
First, I create a table called users, and insert a single user with status 'active'
CREATE TABLE `your-gcp-project-id.dbt_snapshot_example.users` (
user_name STRING,
user_status STRING,
row_updated_at TIMESTAMP
);
INSERT INTO `your-gcp-project-id.dbt_snapshot_example.users`
(user_name, user_status, row_updated_at)
VALUES ('joe', 'active', CURRENT_TIMESTAMP())
;
SELECT *
FROM `your-gcp-project-id.dbt_snapshot_example.users`
;
user_name | user_status | row_updated_at |
---|---|---|
joe | active | 2024-09-01 12:43:36 UTC |
I take a snapshot of the users table - the first time the dbt snapshot command is run, it just creates a copy of the users table (with some additional columns):
dbt snapshot --select users_snapshot_timestamp_strategy
Here is how the snapshot table looks:
SELECT *
FROM `your-gcp-project-id.dbt_snapshot_example.users_snapshot_timestamp_strategy`
;
user_name | user_status | row_updated_at | dbt_scd_id | dbt_updated_at | dbt_valid_from | dbt_valid_to |
---|---|---|---|---|---|---|
joe | active | 2024-09-01 12:43:36 UTC | 7434bb97e2fb34e2c0b481959d70d155 | 2024-09-01 12:43:36 UTC | 2024-09-01 12:43:36 UTC |
Now, I change the status of the user 'joe' in the users table to 'dormant' by overwriting the value in the user_status column.
UPDATE `your-gcp-project-id.dbt_snapshot_example.users`
SET user_status = 'dormant'
, row_updated_at = CURRENT_TIMESTAMP()
WHERE user_name = 'joe'
;
SELECT *
FROM `your-gcp-project-id.dbt_snapshot_example.users`
;
user_name | user_status | row_updated_at |
---|---|---|
joe | dormant | 2024-09-01 12:57:10 UTC |
Running dbt snapshot again appends a new row to the snapshot table, recording this row change:
dbt snapshot --select users_snapshot_timestamp_strategy
SELECT *
FROM `your-gcp-project-id.dbt_snapshot_example.users_snapshot_timestamp_strategy`
;
user_name | user_status | row_updated_at | dbt_scd_id | dbt_updated_at | dbt_valid_from | dbt_valid_to |
---|---|---|---|---|---|---|
joe | dormant | 2024-09-01 12:57:10 UTC | 27d047306121e541ed7a8a82b7d7758e | 2024-09-01 12:57:10 UTC | 2024-09-01 12:57:10 UTC | null |
joe | active | 2024-09-01 12:43:36 UTC | 7434bb97e2fb34e2c0b481959d70d155 | 2024-09-01 12:43:36 UTC | 2024-09-01 12:43:36 UTC | 2024-09-01 12:57:10 UTC |
DROP TABLE IF EXISTS `your-gcp-project-id.dbt_snapshot_example.users`
;
CREATE TABLE `your-gcp-project-id.dbt_snapshot_example.users` (
user_name STRING,
user_status STRING,
);
INSERT INTO `your-gcp-project-id.dbt_snapshot_example.users`
(user_name, user_status)
VALUES ('joe', 'active')
;
SELECT *
FROM `your-gcp-project-id.dbt_snapshot_example.users`
;
user_name | user_status |
---|---|
joe | active |
dbt snapshot --select users_snapshot_check_strategy
SELECT *
FROM `your-gcp-project-id.dbt_snapshot_example.users_snapshot_check_strategy`
;
user_name | user_status | dbt_scd_id | dbt_updated_at | dbt_valid_from | dbt_valid_to |
---|---|---|---|---|---|
joe | active | 5dcf080ad5fc313f0781899e7d9ec442 | 2024-09-01 13:44:51 UTC | 2024-09-01 13:44:51 UTC | null |
UPDATE `your-gcp-project-id.dbt_snapshot_example.users`
SET user_status = 'dormant'
WHERE user_name = 'joe'
;
SELECT * FROM `your-gcp-project-id.dbt_snapshot_example.users`
;
user_name | user_status |
---|---|
joe | dormant |
dbt snapshot --select users_snapshot_check_strategy
SELECT *
FROM `your-gcp-project-id.dbt_snapshot_example.users_snapshot_check_strategy`
;
user_name | user_status | dbt_scd_id | dbt_updated_at | dbt_valid_from | dbt_valid_to |
---|---|---|---|---|---|
joe | active | 5dcf080ad5fc313f0781899e7d9ec442 | 2024-09-01 13:44:51 UTC | 2024-09-01 13:44:51 UTC | 2024-09-01 13:47:43 UTC |
joe | dormant | 3928a06a330872968ed3aa305625b730 | 2024-09-01 13:47:43 UTC | 2024-09-01 13:47:43 UTC | null |