There can be many motivations behind taking your ML application setup to a cloud environment, from needing specialized compute 💪 for training jobs to having a 24x7 load-balanced deployment of your trained model serving user requests 🚀.
We know that the process to set up an MLOps stack can be daunting. There are many components (ever increasing) and each have their own requirements. To make your life easier, we already have a documentation page that shows you different ways of switching to a production-grade setting. This recipe, however, goes one step further.
You can have a simple MLOps stack ready for running your machine learning workloads after you execute this recipe 😍. It sets up the following resources:
- An Azure ML Workspace and cluster that can act as an step operator for your workloads.
- An Azure Blob Storage Container as an artifact store, which can be used to store all your ML artifacts like the model, checkpoints, etc.
For each AzureML Worskpace, azureml automatically provisions a storage account, application insights, key vault, container registry and mlflow server.
Keep in mind, this is a basic setup to get you up and running on Azure with a minimal MLOps stack and more configuration options are coming in the form of new recipes! 👀
- You must have a Azure account where you have sufficient permissions to create and destroy resources that will be created as part of this recipe. Supply the name of your project in the
file. - Have Terraform and Helm installed on your system.
Before starting, you should know the values that you have to keep ready for use in the script.
- Check out the
file to configure basic information about your deployments. - Take a look at the
file to know what values have to be supplied during the execution of the script. These are mostly sensitive values like MLflow passwords, AWS access keys, etc. Make sure you don't commit them!
Warning The
local variable you assign should have a unique value for each stack. This ensures that the stack you create doesn't interfere with the stacks somebody else in your organization has created with this script.
Warning The CIDR block used for the VPC (inside the file) needs to be unique too, preferably. For example, if
is already under use by some VPC in your account, you can use10.11.0.0/16
instead. However, this is not required.
It is not necessary to use the MLOps stacks recipes presented here alongside the ZenML framework. You can simply use the Terraform scripts directly.
However, ZenML works seamlessly with the infrastructure provisioned through these recipes. The ZenML CLI has an integration with this repository that makes it really simple to pull and deploy these recipes. A simple flow could look like the following:
Pull this recipe to your local system.
zenml stack recipe pull azureml-minimal
🎨 Customize your deployment by editing the default values in the
file. -
🔐 Add your secret information like keys and passwords into the
file which is not committed and only exists locally. -
🚀 Deploy the recipe with this simple command.
zenml stack recipe deploy azureml-minimal
Note If you want to allow ZenML to automatically import the created resources as a ZenML stack, pass the
flag to the command above. By default, the imported stack will have the same name as the stack recipe and you can provide your own with the--stack-name
option. -
You'll notice that a ZenML stack configuration file gets created after the previous command executes 🤯! This YAML file can be imported as a ZenML stack manually by running the following command.
zenml stack import <STACK_NAME> -f <PATH_TO_THE_CREATED_STACK_CONFIG_YAML> # set the stack as an active stack zenml stack set <STACK-NAME>
You need to have your local
client logged in. Runaz login
if not done already.
To make the imported ZenML stack work, you'll have to create secrets that some stack components need. If you inspect the generated YAML file, you can figure out that three secrets should be created:
- for allowing access to the Azure Blob Storage Container.-
Go into your imported recipe directory. It should be under
. -
Run the following commands to get the storage account name and key.
``` terraform output storage-account-name terraform output storage-account-key ```
Now, register your ZenML secret.
``` zenml secrets-manager secret register azureml-storage-secret --schema=azure --account_name=<ACCOUNT_NAME> --account_key=<ACCOUNT_KEY> ```
If you face a ClientAuthorizationError
or HTTPRequestError-(Forbidden)
while trying to create secrets, add the relevant permissions to your account using the following command.
Get the key vault name by running the command:
terraform output key-vault-name
Find your Azure object ID. You can also get it from the error message you see.
az ad user show --id <YOUR_AZURE_EMAIL>
Set permissions for your object ID.
az keyvault set-policy --name <KEY_VAULT_NAME> --object-id <YOUR_OBJECT_ID> --secret-permissions get list set delete --key-permissions create delete get list`
To get tracking_token
for Azure MLFlow, you can run the following command.
export CLIENT_ID=$(terraform output service-principal-client-id)
export CLIENT_SECRET=$(terraform output service-principal-client-secret)
export TENANT_ID=$(terraform output service-principal-tenant-id)
TOKEN="$(curl -i -X POST \
-d "client_id=${CLIENT_ID}" \
-d "client_secret=${CLIENT_SECRET}" \
-d "grant_type=client_credentials" \
-d "resource=" \
| jq -r .access_token)"
echo $TOKEN
While registering the experiment tracker, we can add the tracking token obtained from above step.
export TRACKING_URI=$(terraform output mlflow-tracking-URL)
zenml experiment-tracker register mlflow_experiment_tracker --flavor=mlflow --tracking_uri=$TRACKING_URI --tracking_token=$TOKEN
The script, after running, outputs the following.
Output | Description |
subscription-id | Subscription ID of Azure |
resource-group-name | Name of the resource group that is created. |
resource-group-location | Location of the resource group that is created. |
azureml-compute-cluster-name | Name of the compute cluster that is created in AzureML workspace. |
azureml-workspace-name | Name of the AzureML workspace that is created. |
blobstorage-container-path | The Azure Blob Storage Container path for storing your artifacts |
storage-account-name | The name of the Azure Blob Storage account name |
storage-account-connection-string | The Azure Blob Storage account connection string |
mlflow-tracking-URI | The URL for the MLflow tracking server |
key-vault-name | The name of the Azure Key Vault created |
service-principal-id | The ID for created service principal |
service-principal-client-id | The client ID for created service principal |
service-principal-tenant-id | The tenant ID for created service principal |
service-principal-client-secret | The password for created service principal |
For outputs that are sensitive, you'll see that they are not shown directly on the logs. To view the full list of outputs, run the following command.
terraform output
To view individual sensitive outputs, use the following format. Here, the metadata password is being obtained.
terraform output metadata-db-password
Using the ZenML stack recipe CLI commands, you can run the following commands to delete your resources and optionally clean up the recipe files that you had downloaded to your local system.
🗑️ Run the destroy command which removes all resources and their dependencies from the cloud.
zenml stack recipe destroy azureml-minimal
(Optional) 🧹 Clean up all stack recipe files that you had pulled to your local system.
zenml stack recipe clean
As mentioned above, you can still use the recipe without having using the zenml stack recipe
CLI commands or even without installing ZenML. Since each recipe is a group of Terraform modules, you can simply employ the terraform CLI to perform apply
and destroy
🎨 Customize your deployment by editing the default values in the
file. -
🔐 Add your secret information like keys and passwords into the
file which is not committed and only exists locally. -
Initialize Terraform modules and download provider definitions.
terraform init
Apply the recipe.
terraform apply
🗑️ Run the destroy function to clean up all resources.
terraform destroy
Create stack to run on Azure
NOTE:: The recommended approach to register all the resources we deployed here, as a ZenML stack is to import the generated YAML file. However, since we are fetching the token after registration, we have to now update our experiment tracking stack component to include it.
# zenml setup
export STACK_PROFILE="azureml-mlflow"
# azure credentials
export SUBSCRIPTION_ID=$(terraform output subscription-id)
export CLIENT_ID=$(terraform output service-principal-client-id)
export CLIENT_SECRET=$(terraform output service-principal-client-secret)
export TENANT_ID=$(terraform output service-principal-tenant-id)
# azure resource group
export RESOURCE_GROUP=$(terraform output resource-group-name)
export REGION=$(terraform output resource-group-location)
# azure storage
export AZURE_STORAGE_CONNECTION_STRING=$(terraform output storage-account-connection-string)
export CONTAINER_PATH=$(terraform output blobstorage-container-path)
# azureml
export WORKSPACE_NAME=$(terraform output azureml-workspace-name)
export CLUSTER_NAME=$(terraform output azureml-compute-cluster-name)
export KEY_VAULT_NAME=$(terraform output key-vault-name)
# azure mlflow
export TRACKING_URI=$(terraform output mlflow-tracking-URL)
TOKEN="$(curl -i -X POST \
-d "client_id=${CLIENT_ID}" \
-d "client_secret=${CLIENT_SECRET}" \
-d "grant_type=client_credentials" \
-d "resource=" \
| jq -r .access_token)"
echo $TOKEN
zenml clean
zenml init
zenml artifact-store register azure_store \
--flavor=azure \
zenml secrets-manager register azure_secrets_manager \
--flavor=azure \
zenml experiment-tracker register azureml_mlflow_experiment_tracker --flavor=mlflow --tracking_uri=$TRACKING_URI --tracking_token=$TOKEN
zenml step-operator register azureml \
--flavor=azureml \
--subscription_id=$SUBSCRIPTION_ID \
--workspace_name=$WORKSPACE_NAME \
zenml stack register azureml_stack \
-o default \
-a azure_store \
-s azureml \
-x azure_secrets_manager \
-e azureml_mlflow_experiment_tracker \