Note
元のリポジトリに.devcontainerディレクトリを加え、Dockerコンテナを作成しコンテナ内部でStreamlitのアプリを起動するようにしています。 VSCodeのRemote-Containersを使用していますが、docker-compose upでも起動できると思います。
The Semantic Model Generator
is used to generate a semantic model for use in your Snowflake account.
You can generate semantic models through our Streamlit app, the command line, or directly in your Python code. Please complete the instructions in setup, then proceed to the instructions for your desired approach.
If you want to see what a semantic model looks like, skip to Examples.
- semantic-model-generator
Note
元のリポジトリの説明に追記した部分です。
.envファイルを.devcontainerディレクトリ配下に作成してください。 ここでは、認証方法として「1. Username and Password」を使用しているので、それに必要な環境変数をセットします。
COMPOSE_PROJECT_NAME=<project_name>
PORTS_NUM_ST1=<port_number>
SNOWFLAKE_ROLE="<my_role>"
SNOWFLAKE_WAREHOUSE="<my_warehouse>"
SNOWFLAKE_USER="<my_user>"
SNOWFLAKE_PASSWORD="<my_pw>"
SNOWFLAKE_ACCOUNT_LOCATOR="<my_snowflake_account>"
SNOWFLAKE_HOST="<my_snowflake_host>"
以下の変数は新たに追加したものです。
- COMPOSE_PROJECT_NAME: プロジェクト名を指定します。コンテナ名もここで指定したものになります
- PORTS_NUM_ST1: Streamlitを起動するためのポート番号を指定します。
拡張機能の部分は必要に応じて自分好みに編集してください。
VSCodeであればコマンドパレットを開いてDev Containers: Reopen in Container
を選択すると作成できます。
あるいは、.devcontainerディレクトリに移動し、ターミナルからdocker-compose up
でも起動できます。
以降はコンテナ内での作業になります。 Streamlit Appの手順に従ってSemantic Model Generatorを起動してください。
We currently leverage credentials saved as environment variables.
A. To find your Account locator, please execute the following sql command in your account.
SELECT CURRENT_ACCOUNT_LOCATOR();
B. To find the SNOWFLAKE_HOST
for your
account, follow these instructions.
- Currently we recommend you to look under the
Account locator (legacy)
method of connection for better compatibility on API. - It typically follows format of:
<accountlocator>.<region>.<cloud>.snowflakecomputing.com
. Ensure that you omit thehttps://
prefix. SNOWFLAKE_HOST
is required if you are using the Streamlit app, but may not be required for the CLI tool depending on your Snowflake deployment. We would recommend setting it regardless.
We recommend setting these environment variables by creating a .env
file in the root directory of this repo. See the
examples in .env.example
for reference and proper syntax for .env
files.
However, if you would like to set these variables directly in your shell/Python environment,
- MacOS/Linux syntax:
export SNOWFLAKE_ROLE="<your-snowflake-role>"
export SNOWFLAKE_WAREHOUSE="<your-snowflake-warehouse>"
export SNOWFLAKE_USER="<your-snowflake-user>"
export SNOWFLAKE_ACCOUNT_LOCATOR="<your-snowflake-account-locator>"
export SNOWFLAKE_HOST="<your-snowflake-host>"
- Windows syntax:
set SNOWFLAKE_ROLE=<your-snowflake-role>
set SNOWFLAKE_WAREHOUSE=<your-snowflake-warehouse>
set SNOWFLAKE_USER=<your-snowflake-user>
set SNOWFLAKE_ACCOUNT_LOCATOR=<your-snowflake-account-locator>
set SNOWFLAKE_HOST=<your-snowflake-host>
- Python syntax:
import os
# Setting environment variables
os.environ['SNOWFLAKE_ROLE'] = '<your-snowflake-role>'
os.environ['SNOWFLAKE_WAREHOUSE'] = '<your-snowflake-warehouse>'
os.environ['SNOWFLAKE_USER'] = '<your-snowflake-user>'
os.environ['SNOWFLAKE_ACCOUNT_LOCATOR'] = '<your-snowflake-account-locator>'
os.environ['SNOWFLAKE_HOST'] = '<your-snowflake-host>'
Our semantic model generators currently support three types of authentication. If no SNOWFLAKE_AUTHENTICATOR
environment variable
is set, the default is snowflake
, which uses standard username/password support.
- Username and Password
Note: If you have MFA enabled, using this default authenticator should send a push notification to your device.
# no SNOWFLAKE_AUTHENTICATOR needed
SNOWFLAKE_PASSWORD="<your-snowflake-password>"
# MacOS/Linux
export SNOWFLAKE_PASSWORD="<your-snowflake-password>"
# Windows
set SNOWFLAKE_PASSWORD=<your-snowflake-password>
# Python
os.environ['SNOWFLAKE_PASSWORD'] = '<your-snowflake-password>'
- Username/Password with MFA passcode
Using a passcode from your authenticator app:
SNOWFLAKE_AUTHENTICATOR="username_password_mfa"
SNOWFLAKE_PASSWORD="<your-snowflake-password>"
SNOWFLAKE_MFA_PASSCODE="<your-snowflake-mfa-passcode>" # if your authenticator app reads "123 456", fill in "123456" (No spaces)
# MacOS/Linux
export SNOWFLAKE_AUTHENTICATOR="username_password_mfa"
export SNOWFLAKE_PASSWORD="<your-snowflake-password>"
export SNOWFLAKE_MFA_PASSCODE="<your-snowflake-mfa-passcode>"
# Windows
set SNOWFLAKE_AUTHENTICATOR=username_password_mfa
set SNOWFLAKE_PASSWORD=<your-snowflake-password>
set SNOWFLAKE_MFA_PASSCODE=<your-snowflake-mfa-passcode>
# Python
os.environ['SNOWFLAKE_AUTHENTICATOR'] = 'username_password_mfa'
os.environ['SNOWFLAKE_PASSWORD'] = '<your-snowflake-password>'
os.environ['SNOWFLAKE_MFA_PASSCODE'] = '<your-snowflake-mfa-passcode>'
Using a passcode embedded in the password:
SNOWFLAKE_AUTHENTICATOR="username_password_mfa"
SNOWFLAKE_PASSWORD="<your-snowflake-password>"
SNOWFLAKE_MFA_PASSCODE_IN_PASSWORD="true"
# MacOS/Linux
export SNOWFLAKE_AUTHENTICATOR="username_password_mfa"
export SNOWFLAKE_PASSWORD="<your-snowflake-password>"
export SNOWFLAKE_MFA_PASSCODE_IN_PASSWORD="true"
# Windows
set SNOWFLAKE_AUTHENTICATOR=username_password_mfa
set SNOWFLAKE_PASSWORD=<your-snowflake-password>
set SNOWFLAKE_MFA_PASSCODE_IN_PASSWORD=true
# Python
os.environ['SNOWFLAKE_AUTHENTICATOR'] = 'username_password_mfa'
os.environ['SNOWFLAKE_PASSWORD'] = '<your-snowflake-password>'
os.environ['SNOWFLAKE_MFA_PASSCODE_IN_PASSWORD'] = 'true'
- Single Sign-On (SSO) with Okta
# no SNOWFLAKE_PASSWORD needed
SNOWFLAKE_AUTHENTICATOR="externalbrowser"
# MacOS/Linux
export SNOWFLAKE_AUTHENTICATOR="externalbrowser"
# Windows
set SNOWFLAKE_AUTHENTICATOR=externalbrowser
# Python
os.environ['SNOWFLAKE_AUTHENTICATOR'] = 'externalbrowser'
We offer a convenient Streamlit app that supports creating semantic models from scratch as well as iterating on existing ones uploaded to a Snowflake stage.
To install dependencies for the Streamlit app, run
make setup_admin_app
This uses pip
to install dependencies from the root pyproject.toml
; feel free to use conda
or any package manager
you prefer.
Once installed, you can run the app using the provided Makefile target, or with your current version of Python manually specified:
# Make target
make run_admin_app
# directly
python3.11 -m streamlit run admin_apps/app.py
The generator supports merging data from your existing semantic models built with partners such as dbt and Looker. To use the Looker features, please install the extras:
pip install -e ".[looker]"
You may also generate a semantic model directly from the CLI. To do this, first install the CLI tool dependencies, which differ from the Streamlit app's dependencies.
Unlike the Streamlit route above, using the CLI assumes that you will manage your environment using poetry
and pyenv
for Python versions.
This has only been tested on MacOS/Linux.
- If you need brew, run
make install-homebrew
. - If you need pyenv,
make install-pyenv
andmake install-python-3.8
. - Run
make setup
to install all external dependencies into your Poetry environment. This will also installpoetry
if needed. - Spawn a shell in the virtual environment using
poetry shell
. This will activate your virtual environment.
You are now ready to generate semantic models via the CLI! The generation command uses the following syntax:
python -m semantic_model_generator.generate_model \
--base_tables "['<your-database-name-1>.<your-schema-name-1>.<your-base-table-or-view-name-1>','<your-database-name-2>.<your-schema-name-2>.<your-base-table-or-view-name-2>']" \
--semantic_model_name "<a-meaningful-semantic-model-name>" \
--snowflake_account="<your-snowflake-account>"
You may generate a semantic model for a given list of fully qualified tables following the {database}.{schema}.{table}
format. Each table in this list should be a physical table or a view present in your database.
All generated semantic models by default are saved either under semantic_model_generator/output_models
if running from
the root of this project or the current directory you're in.
You may also use the CLI tool to validate one of your semantic models. From inside your Poetry shell, run
python -m semantic_model_generator.validate_model \
--yaml_path="/path/to/your/model_yaml.yaml \
--snowflake_account="<your-account-name>"
You may also create/validate your semantic models from directly within your Python code. First, ensure that you have
installed the Python package. Note, the version below should be the latest version under the dist/
directory.
pip install dist/*.whl
from semantic_model_generator.generate_model import generate_base_semantic_model_from_snowflake
BASE_TABLES = ['<your-database-name-1>.<your-schema-name-1>.<your-base-table-or-view-name-1>',
'<your-database-name-2>.<your-schema-name-2>.<your-base-table-or-view-name-2>']
SNOWFLAKE_ACCOUNT = "<your-snowflake-account>"
SEMANTIC_MODEL_NAME = "<a-meaningful-semantic-model-name>"
generate_base_semantic_model_from_snowflake(
base_tables=BASE_TABLES,
snowflake_account=SNOWFLAKE_ACCOUNT,
semantic_model_name=SEMANTIC_MODEL_NAME,
)
from semantic_model_generator.validate_model import validate_from_local_path
YAML_PATH = "/path/to/your/model_yaml.yaml"
SNOWFLAKE_ACCOUNT = "<your-snowflake-account>"
validate_from_local_path(
yaml_path=YAML_PATH,
snowflake_account=SNOWFLAKE_ACCOUNT
)
Due to context window as well as quality constraints, we currently limit the size of the generated semantic model to < 30,980 tokens (~123,920 characters).
Please note sample values and verified queries is not counted into this token length constraints. You can include as much sample values or verified queries as you'd like with limiting the overall file to <1MB.
If your snowflake tables and comments do not have comments, we currently leverages cortex LLM function to auto-generate description suggestions. Those generation are suffixed with '__' and additional comment to remind you to confirm/modity the descriptions.
IMPORTANT: After generation, your YAML files will have a series of lines with # <FILL-OUT>
. Please take the time
to fill these out with your business context, or else subsequent validation of your model will fail.
By default, the generated semantic model will contain all columns from the provided tables/views. However, it's highly encouraged to only keep relevant columns and drop any unwanted columns from the generated semantic model.
In addition, consider adding the following elements to your semantic model:
- Logical columns for a given table/view that are expressions over physical columns.
- Example:
col1 - col2
could be theexpr
for a logical column.
- Example:
- Synonyms. Any additional synonyms for column names.
- Filters. Additional filters with their relevant
expr
.
We continue to add support for partner semantic and metric layers. Our aim is to expedite the creation of Cortex Analyst semantic files using logic and metadata from partner tools. Please see below for details about current partner support.
IMPORTANT: Use the Streamlit App to leverage existing partner semantic/metric layers.
Tool | Method | Requirements |
---|---|---|
DBT | We extract and translate metadata from semantic_models in uploaded DBT yaml file(s) and merge with a generated Cortex Analyst semantic file table-by-table. | DBT models and sources leading up to the semantic model layer(s) must be tables/views in Snowflake. |
Looker | We materialize your Explore dataset in Looker as Snowflake table(s) and generate a Cortex Analyst semantic file. Metadata from your Explore fields can be merged with the generated Cortex Analyst semantic file. | Looker Views referenced in the Looker Explores must be tables/views in Snowflake. Looker SDK credentials are required. Visit Looker Authentication SDK Docs for more information. Install Looker's API Explorer extension from the Looker Marketplace to view API credentials directly. |
If you have an example table in your account with the following DDL statements.
CREATE TABLE sales.public.sd_data
(
id SERIAL PRIMARY KEY,
dt DATETIME,
cat VARCHAR(255),
loc VARCHAR(255),
cntry VARCHAR(255),
chn VARCHAR(50),
amt DECIMAL(10, 2),
unts INT,
cst DECIMAL(10, 2)
);
Here is an example semantic model, with data elements automatically generated from this repo and filled out by a user.
# Name and description of the semantic model.
name: Sales Data
description: This semantic model can be used for asking questions over the sales data.
# A semantic model can contain one or more tables.
tables:
# A logical table on top of the 'sd_data' base table.
- name: sales_data
description: A logical table capturing daily sales information across different store locations and product categories.
# The fully qualified name of the base table.
base_table:
database: sales
schema: public
table: sd_data
# Dimension columns in the logical table.
dimensions:
- name: product_category
synonyms:
- "item_category"
- "product_type"
description: The category of the product sold.
expr: cat
data_type: NUMBER
unique: false
sample_values:
- "501"
- "544"
- name: store_country
description: The country where the sale took place.
expr: cntry
data_type: TEXT
unique: false
sample_values:
- "USA"
- "GBR"
- name: sales_channel
synonyms:
- "channel"
- "distribution_channel"
description: The channel through which the sale was made.
expr: chn
data_type: TEXT
unique: false
sample_values:
- "FB"
- "GOOGLE"
# Time dimension columns in the logical table.
time_dimensions:
- name: sale_timestamp
synonyms:
- "time_of_sale"
- "transaction_time"
description: The time when the sale occurred. In UTC.
expr: dt
data_type: TIMESTAMP
unique: false
# Measure columns in the logical table.
measures:
- name: sales_amount
synonyms:
- "revenue"
- "total_sales"
description: The total amount of money generated from the sale.
expr: amt
data_type: NUMBER
default_aggregation: sum
- name: sales_tax
description: The sales tax paid for this sale.
expr: amt * 0.0975
data_type: NUMBER
default_aggregation: sum
- name: units_sold
synonyms:
- "quantity_sold"
- "number_of_units"
description: The number of units sold in the transaction.
expr: unts
data_type: NUMBER
default_aggregation: sum
- name: cost
description: The cost of the product sold.
expr: cst
data_type: NUMBER
default_aggregation: sum
- name: profit
synonyms:
- "earnings"
- "net income"
description: The profit generated from a sale.
expr: amt - cst
data_type: NUMBER
default_aggregation: sum
# A table can define commonly used filters over it. These filters can then be referenced in user questions directly.
filters:
- name: north_america
synonyms:
- "North America"
- "N.A."
- "NA"
description: "A filter to restrict only to north american countries"
expr: cntry IN ('canada', 'mexico', 'usa')
In order to push a new build and release, follow the steps below. Note, only admins are allowed to push release/v
tags.
You should follow the setup commands from usage-cli to install poetry and create your environment.
- Checkout a new branch from main. You should name this branch
release/vYYYY-MM-DD
. - Bump the poetry:
poetry version patch
- increments0.1.x
to0.1.(x+1)
poetry version minor
- increments0.x.0
to0.(x+1).0
poetry version major
- incrementsx.0.0
to(x+1).0.0
- Update the
CHANGELOG.md
adding a relevant header for your version number along with a description of the changes made. - Run
make build
to create a new .whl file. - Push your files for approval.
- After approval, run
make release
which will cut a new release and attach the .whl file. - Merge in your pr.
- Note: If you
make release
does not trigger the GH action. Please delete the tag and push again.