Skip to content

ksalama/ucaip-labs

 
 

Repository files navigation

(WIP) MLOps on Vertex AI

This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities. The example use Keras to implement the ML model, TFX to implement the training pipeline, and Model Builder SDK to interact with Vertex AI.

MLOps lifecycle

Getting started

  1. Setting up MLOps environment on Google Cloud.
  2. Start your AI Notebook instance.
  3. Open the JupyterLab then open a new Terminal
  4. Clone the repository to your AI Notebook instance:
    git clone https://github.com/ksalama/ucaip-labs.git
    cd ucaip-labs
    
  5. Run the following commands to install the required packages:
    pip install tfx==0.30.0
    pip install tensorflow==2.4.1
    pip install -r requirements.txt
    

Dataset Management

The Chicago Taxi Trips dataset is one ofof public datasets hosted with BigQuery, which includes taxi trips from 2013 to the present, reported to the City of Chicago in its role as a regulatory agency. The task is to predict whether a given trip will result in a tip > 20%.

The 01-dataset-management notebook covers:

  1. Performing exploratory data analysis on the data in BigQuery.
  2. Creating managed Vertex AI Dataset using the Python SDK.
  3. Generating the schema for the raw data using TensorFlow Data Validation.

ML Development

We experiment with creating a Custom Model using 02-experimentation notebook, which covers:

  1. Preparing the data using Dataflow.
  2. Implementing a Keras classification model.
  3. Training the Keras model in Vertex AI using a pre-built container.
  4. Upload the exported model from Cloud Storage to Vertex AI as a Model.
  5. Exract and visualize experiment parameters from Vertex AI Metadata.

We use Vertex TensorBoard and Vertex ML Metadata to track, visualize, and compare ML experiments.

In addition, the training steps are formalized by implementing a TFX pipeline. The 03-training-formalization notebook covers implementing and testing the pipeline components interactively.

Training Operationalization

The end-to-end TFX training pipeline implementation is in the src/pipelines directory, which covers the following steps:

  1. Receive hyperparameters using hyperparam_gen custom python component.
  2. Extract data from BigQuery using BigQueryExampleGen.
  3. Validate the raw data using StatisticsGen and ExampleValidator.
  4. Process the data using Transform.
  5. Train a custom model using Trainer.
  6. Evaluat and validate the custom model using ModelEvaluator.
  7. Save the blessed to model registry location using using Pusher.
  8. Upload the model to Vertex AI using aip_model_pusher custom python component.

The 04-pipeline-deployment notebook covers testing, compiling, and running the pipeline locally and using Vertex AI Pipelines.

Continuous Training

After testing, compiling, and uploading the pipeline definition to Cloud Storage, the pipeline is executed with respect to a trigger. We use Cloud Functions and Cloud Pub/Sub as a triggering mechanism.

The 05-continuous-training notebook covers the following steps:

  1. Create the Cloud Pub/Sub topic.
  2. Deploy the Cloud Function, which is implemented in src/pipeline_triggering.
  3. Test triggering a pipeline.

Model Deployment

We use Cloud Build test and deploy the uploaded model to Vertex AI Prediction. The 06-model-deployment configures and executes the build/model-deployment.yaml file with the following steps:

  1. Creating an Vertex AI Endpoint.
  2. Test model interface.
  3. Create an endpoint in Vertex AI.
  4. Deploy the model to the endpoint.
  5. Test the endpoint.

Prediction Serving

We serve the deployed model for prediction. The 07-prediction-serving notebook covers:

  1. Use the endpoint for online prediction.
  2. Use the uploaded model for batch prediciton.
  3. Run the batch prediction using Vertex AI Pipelines.

Model Monitoring

After a model is deployed in for prediciton serving, continuous monitoring is set up to ensure that the model continue to perform as expected. The 08-model-monitoring notebook covers configuring Vertex AI Model Monitoring for skew and dirft detection:

  1. Set skew and drift threshold.
  2. Create a monitoring job for all the models under and endpoint.
  3. List the monitoring jobs.
  4. List artifacts produced by monitoring job.
  5. Pause and delete the monitoring job.

Metadata Tracking

You can view the parameters and metrics logged by your experiments, as well as the artifacts and metadata stored by your Vertex AI Pipelines in Cloud Console.

Disclaimer

This is not an official Google product but sample code provided for an educational purpose.


Copyright 2021 Google LLC.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

Code labs for Vertex AI

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published