Renamed all 'validation set' to 'test set' for consistency.

zenml-io · May 12, 2022 · 4094fa7 · 4094fa7
1 parent ec635d3
commit 4094fa7
Show file tree

Hide file tree

Showing 4 changed files with 13 additions and 13 deletions.
diff --git a/2-1_Experiment_Tracking.ipynb b/2-1_Experiment_Tracking.ipynb
@@ -77,7 +77,7 @@
     "\n",
     "[MLflow](https://mlflow.org/) is an amazing open-source MLOps platform that provides powerful tools to handle various ML lifecycle steps, such as experiment tracking, code packaging, model deployment, and more. In this lesson, we will focus on the [MLflow Tracking](https://mlflow.org/docs/latest/tracking.html) component, but we will learn about other MLflow components in later lessons.\n",
     "\n",
-    "To integrate the MLFlow experiment tracker into our previously defined ZenML pipeline, we only need to adjust the `svc_trainer` step. Let us define a new `svc_trainer_mlflow` step in which we use MLflow's [`mlflow.sklearn.autolog()`](https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html#mlflow.sklearn.autolog) feature to automatically log all relevant attributes of our model to MLflow. By adding an `@enable_mlflow` decorator on top of the function, ZenML then automatically initializes MLflow and takes care of the rest for us.\n",
+    "To integrate the MLFlow experiment tracker into our previously defined ZenML pipeline, we only need to adjust the `svc_trainer` step. Let us define a new `svc_trainer_mlflow` step in which we use MLflow's [`mlflow.sklearn.autolog()`](https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html#mlflow.sklearn.autolog) feature to automatically log all relevant attributes and metrics of our model to MLflow. By adding an `@enable_mlflow` decorator on top of the function, ZenML then automatically initializes MLflow and takes care of the rest for us.\n",
     "\n",
     "The following function creates such a step, parametrized by the SVC hyperparameter `gamma`, then returns a corresponding ML pipeline."
    ]
@@ -220,7 +220,7 @@
     "\n",
     "![MLflow UI](_assets/2-1/mlflow_ui.png)\n",
     "\n",
-    "Click on the `Parameters >` tab on top of the table to see *all* hyperparameters of your model. Now you can see at a glance which model performed best and which hyperparameters changed between different runs. In our case, we can see that the SVC model with `gamma=0.001` achieved the best validation accuracy of `0.969`.\n",
+    "Click on the `Parameters >` tab on top of the table to see *all* hyperparameters of your model. Now you can see at a glance which model performed best and which hyperparameters changed between different runs. In our case, we can see that the SVC model with `gamma=0.001` achieved the best test accuracy of `0.969`.\n",
     "\n",
     "If we click on one of the links in the `Start Time` column, we can see additional details of the respective run. In particular, under the `Artifacts` tab, we can find a `model.pkl` file, which we could now use to deploy our model in an inference/production environment. In the next lesson, `2-2_Local_Deployment.ipynb`, we will learn how to do this automatically as part of our pipelines with the [MLflow Models](https://mlflow.org/docs/latest/models.html)  component."
    ]
@@ -275,7 +275,7 @@
     "\n",
     "The main difference to the MLflow example before is that W&B has no sklearn autolog functionality. Instead, we need to call `wandb.log(...)` for each value we want to log to Weights & Biases.\n",
     "\n",
-    "Since we also want to log our validation score, we need to adjust our `evaluator` step accordingly as well.\n",
+    "Since we also want to log our test score, we need to adjust our `evaluator` step accordingly as well.\n",
     "\n",
     "Note that, despite wandb being used in different steps within a pipeline, ZenML handles initializing wandb and ensures that the experiment name is the same as the pipeline name and that the experiment run name is the same as the pipeline run name. This establishes a lineage between pipelines in ZenML and experiments in wandb."
    ]

diff --git a/2-2_Local_Deployment.ipynb b/2-2_Local_Deployment.ipynb
@@ -34,7 +34,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "ZenML provides a standard step for deployment to MLflow, so we don't need to write any code ourselves. To deploy our model after training it, all we need to do is to add the `mlflow_model_deployer_step` into our pipeline. In addition to the trained model, this step expects a boolean argument of whether to deploy the model or not. This is very useful in practice, as it allows you to define some requirements for deploying your models, i.e., that it performs better than the currently deployed model, or that no data drift is happening. For now, let us define a `deployment_trigger` that only deploys a model if the validation accuracy is over 90%:"
+    "ZenML provides a standard step for deployment to MLflow, so we don't need to write any code ourselves. To deploy our model after training it, all we need to do is to add the `mlflow_model_deployer_step` into our pipeline. In addition to the trained model, this step expects a boolean argument of whether to deploy the model or not. This is very useful in practice, as it allows you to define some requirements for deploying your models, i.e., that it performs better than the currently deployed model, or that no data drift is happening. For now, let us define a `deployment_trigger` that only deploys a model if the test accuracy is over 90%:"
    ]
   },
   {
@@ -49,9 +49,9 @@
     "\n",
     "\n",
     "@step\n",
-    "def deployment_trigger(val_acc: float) -> bool:\n",
-    "    \"\"\"Only deploy if the validation accuracy > 90%.\"\"\"\n",
-    "    return val_acc > 0.9\n",
+    "def deployment_trigger(test_acc: float) -> bool:\n",
+    "    \"\"\"Only deploy if the test accuracy > 90%.\"\"\"\n",
+    "    return test_acc > 0.9\n",
     "\n",
     "\n",
     "@pipeline(enable_cache=False)\n",
@@ -169,7 +169,7 @@
    "source": [
     "Let's play with it a bit and send it a request. \n",
     "\n",
-    "First, let's query the artifact store to get a sample from the validation set of our last run."
+    "First, let's query the artifact store to get a sample from the test set of our last run."
    ]
   },
   {

diff --git a/3-1_Data_Skew.ipynb b/3-1_Data_Skew.ipynb
@@ -46,7 +46,7 @@
    "source": [
     "## Detect Train-Test Skew\n",
     "\n",
-    "To start out, we will use Evidently to check for skew between our training and validation datasets. To do so, we will define a new pipeline with an Evidently step, into which we will then pass our training and validation datasets as . \n",
+    "To start out, we will use Evidently to check for skew between our training and test datasets. To do so, we will define a new pipeline with an Evidently step, into which we will then pass our training and test datasets. \n",
     "\n",
     "At its core, Evidently’s distribution difference calculation functions take in a reference dataset and compare it with a separate comparison dataset. These are both passed in as [pandas DataFrames](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), though CSV inputs are also possible. ZenML implements this functionality in the form of several standardized steps along with an easy way to use the visualization tools also provided along with Evidently as ‘Dashboards’.\n",
     "\n",
@@ -205,7 +205,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As we see, there is no skew between our training and validation sets. That's great!\n",
+    "As we see, there is no skew between our training and test sets. That's great!\n",
     "\n",
     "In the next lessons, we will add mechanism for training-serving skew and data drift detection into our inference pipelines and will set up automated alerts whenever any data issues were detected. Those lessons are still work in progress, so stay tuned!"
    ]

diff --git a/steps/deployment_trigger.py b/steps/deployment_trigger.py
@@ -2,6 +2,6 @@
 
 
 @step
-def deployment_trigger(val_acc: float) -> bool:
-    """Only deploy if the validation accuracy > 90%."""
-    return val_acc > 0.9
+def deployment_trigger(test_acc: float) -> bool:
+    """Only deploy if the test accuracy > 90%."""
+    return test_acc > 0.9