openvaccine-kaggle-competition

Objective

This example is based on the Titanic OpenVaccine competition (https://www.kaggle.com/c/stanford-covid-vaccine). The objective of this exercise is to develop models and design rules for RNA degradation.

Environment

This pipeline was tested using Kubeflow 1.4 and kfp 1.1.2 and x86-64 and ARM based system which includes all Intel and AMD based CPU's and M1/M2 series Macbooks.

Step 1: Setup Kubeflow as a Service

If you haven’t already, sign up (https://www.arrikto.com/kubeflow-as-a-service/)
Deploy Kubeflow

Step 2: Launch a Notebook Server

Bump memory to 2GB and vCPUs to 2

Step 3: Clone the Project Repo to Your Notebook

(Kubeflow as a Service) Open up a terminal in the Notebook Server and git clone the kubeflow/examples repository

git clone https://github.com/kubeflow/examples

Step 4: Setup DockerHub and Docker

If you haven’t already, sign up (https://hub.docker.com/) for DockerHub
If you haven’t already, install Docker Desktop locally (https://www.docker.com/products/docker-desktop/) OR install the Docker command line utility (https://docs.docker.com/get-docker/) and enter sudo docker login command in your terminal and log into Docker with your your DockerHub username and password

Step 5: Setup Kaggle

If you haven’t already done so, sign up (https://www.kaggle.com/) for Kaggle
(On Kaggle) Generate an API token (https://www.kaggle.com/docs/api)
(Kubeflow as a Service) Create a Kubernetes secret

kubectl create secret generic kaggle-secret --from-literal=KAGGLE_USERNAME=<username> --from-literal=KAGGLE_KEY=<api_token>

Step 6: Install Git

(Locally) If you don’t have it already, install Git (https://github.com/git-guides/install-git)

Step 7: Clone the Project Repo Locally

(Locally) Git clone the kubeflow/examples repository

git clone https://github.com/kubeflow/examples

Step 8: Create a PodDefault Resource

(Kubeflow as a Service) Navigate to the openvaccine-kaggle-competition directory
Create a resource.yaml file

resource.yaml:

apiVersion: "kubeflow.org/v1alpha1"
kind: PodDefault
metadata:
  name: kaggle-access
spec:
 selector:
  matchLabels:
    kaggle-secret: "true"
 desc: "kaggle-access"
 volumeMounts:
 - name: secret-volume
   mountPath: /secret/kaggle
 volumes:
 - name: secret-volume
   secret:
    secretName: kaggle-secret

Apply created resource using: kubectl apply -f resource.yaml

Step 9: Explore the `load-data` directory

(Locally) Navigate to the openvaccine-kaggle-competition/pipeline-components/load-data directory
Open up the load.py file
Note the code in this file that will perform the actions required in the “load-data” pipeline step

Step 10: Build the `load-data` Docker Image

(Locally) Navigate to the openvaccine-kaggle-competition/pipeline-components/load-data directory
Build the Docker image if locally you are using arm64 (Apple M1)

docker build --platform=linux/amd64 -t <docker_username>/<docker_imagename>:<tag>-amd64 .

OR build the Docker image if locally you are using amd64

docker build -t <docker_username>/<docker_imagename>:<tag> .

Step 11: Push the `load-data` Docker Image to DockerHub

(Locally) Navigate to the openvaccine-kaggle-competition/pipeline-components/load-data directory
Push the Docker image if locally you are using arm64 (Apple M1)

docker push <docker_username>/<docker_imagename>:<tag>-amd64

OR build the Docker image if locally you are using amd64

docker push <docker_username>/<docker_imagename>:<tag>

Step 12: Explore the `preprocess-data` directory

(Locally) Navigate to the openvaccine-kaggle-competition/pipeline-components/preprocess-data directory
Open up the preprocess.py file
Note the code in this file that will perform the actions required in the “preprocess” pipeline step

Step 13: Explore the `preprocess-data` directory

(Locally) Navigate to the openvaccine-kaggle-competition/pipeline-components/preprocess-data directory
Build the Docker image if locally you are using arm64 (Apple M1)

docker build --platform=linux/amd64 -t <docker_username>/<docker_imagename>:<tag>-amd64 .

OR build the Docker image if locally you are using amd64

docker build -t <docker_username>/<docker_imagename>:<tag> .

Step 14: Push the `preprocess-data` Docker Image to DockerHub

(Locally) Navigate to the openvaccine-kaggle-competition/pipeline-components/preprocess-data directory
Push the Docker image if locally you are using arm64 (Apple M1)

docker push <docker_username>/<docker_imagename>:<tag>-amd64

OR build the Docker image if locally you are using amd64

docker push <docker_username>/<docker_imagename>:<tag>

Step 15: Explore the `model-training` directory

(Locally) Navigate to the openvaccine-kaggle-competition/pipeline-components/model-training directory
Open up the model.py file
Note the code in this file that will perform the actions required in the “train” pipeline step

Step 16: Build the `model-training` Docker Image

(Locally) Navigate to the openvaccine-kaggle-competition/pipeline-components/model-training directory
Build the Docker image if locally you are using arm64 (Apple M1)

docker build --platform=linux/amd64 -t <docker_username>/<docker_imagename>:<tag>-amd64 .

OR build the Docker image if locally you are using amd64

docker build -t <docker_username>/<docker_imagename>:<tag> .

Step 17: Push the `model-training` Docker Image to DockerHub

(Locally) Navigate to the openvaccine-kaggle-competition/pipeline-components/model-training directory
Push the Docker image if locally you are using arm64 (Apple M1)

docker push <docker_username>/<docker_imagename>:<tag>-amd64

OR build the Docker image if locally you are using amd64

docker push <docker_username>/<docker_imagename>:<tag>

Step 18: Explore the `model-evaluation` directory

(Locally) Navigate to the openvaccine-kaggle-competition/pipeline-components/model-evaluation directory
Open up the eval.py file
Note the code in this file that will perform the actions required in the “test” pipeline step

Step 19: Build the `model-evaluation` Docker Image

(Locally) Navigate to the openvaccine-kaggle-competition/pipeline-components/model-evaluation directory
Build the Docker image if locally you are using arm64 (Apple M1)

docker build --platform=linux/amd64 -t <docker_username>/<docker_imagename>:<tag>-amd64 .

OR build the Docker image if locally you are using amd64

docker build -t <docker_username>/<docker_imagename>:<tag> .

Step 20: Push the `model-evaluation` Docker Image to DockerHub

(Locally) Navigate to the openvaccine-kaggle-competition/pipeline-components/model-evaluation directory
Push the Docker image if locally you are using arm64 (Apple M1)

docker push <docker_username>/<docker_imagename>:<tag>-amd64

OR build the Docker image if locally you are using amd64

docker push <docker_username>/<docker_imagename>:<tag>

Step 21: Modify the openvaccine-kaggle-competiton-kfp.py file

(Kubeflow as a Service) Navigate to the openvaccine-kaggle-competition directory
Update the openvaccine-kaggle-competiton-kfp.py with accurate Docker Image inputs

   return dsl.ContainerOp(
        name = 'load-data', 
        image = '<dockerhub username>/<image name>:<tag>',

—-----

def GetMsg(comp1):
    return dsl.ContainerOp(
        name = 'preprocess',
        image = '<dockerhub username>/<image name>:<tag>',

—-----

def Train(comp2, trial, epoch, batchsize, embeddim, hiddendim, dropout, spdropout, trainsequencelength):
    return dsl.ContainerOp(
        name = 'train',
        image = '<dockerhub username>/<image name>:<tag>',

—-----

def Eval(comp1, trial, epoch, batchsize, embeddim, hiddendim, dropout, spdropout, trainsequencelength):
    return dsl.ContainerOp(
        name = 'Evaluate',
  image = '<dockerhub username>/<image name>:<tag>',

Step 22: Generate a KFP Pipeline yaml File

(Locally) Navigate to the openvaccine-kaggle-competition directory and delete the existing openvaccine-kaggle-competition-kfp.yaml file
(Kubeflow as a Service) Navigate to the openvaccine-kaggle-competition directory

Build a python virtual environment :

Step a) Update pip

python3 -m pip install --upgrade pip

Step b) Install virtualenv

sudo pip3 install virtualenv

Step c) Check the installed version of venv

virtualenv --version

Step d) Name your virtual enviornment as kfp

virtualenv kfp

Step e) Activate your venv.

source kfp/bin/activate

After this virtual environment will get activated. Now in our activated venv we need to install following packages:

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install -y git python3-pip

python3 -m pip install  kfp==1.1.2

After installing packages create the yaml file

Inside venv point your terminal to a path which contains our kfp file to build pipeline (openvaccine-kaggle-competition-kfp.py) and run these commands to generate a yaml file for the Pipeline:

python3 openvaccine-kaggle-competition-kfp.py

Download the openvaccine-kaggle-competition-kfp.yaml file that was created to your local openvaccine-kaggle-competition directory

Step 23: Create an Experiment

(Kubeflow as a Service) Within the Kubeflow Central Dashboard, navigate to the Experiments (KFP) > Create Experiment view
Name the experiment and click Next
Click on Experiments (KFP) to view the experiment you just created

Step 24: Create a Pipeline

(Kubeflow as a Service) Within the Kubeflow Central Dashboard, navigate to the Pipelines > +Upload Pipeline view
Name the pipeline
Click on Upload a file
Upload the local openvaccine-kaggle-competition-kfp.yaml file
Click Create

Step 25: Create a Run

(Kubeflow as a Service) Click on Create Run in the view from the previous step
Choose the experiment we created in Step 23
Input your desired run parameters. For example:

TRIAL = 1
EPOCHS = 2
BATCH_SIZE = 64
EMBED_DIM = 100
HIDDEN_DIM = 128
DROPOUT = .2
SP_DROPOUT = .3
TRAIN_SEQUENCE_LENGTH = 107

Click Start
Click on the run name to view the runtime execution graph

Troubleshooting Tips:

While running the pipeline as mentioned above you may come across this error:

errorlog:

kaggle.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Date': 'Thu, 23 Jun 2022 11:31:18 GMT', 'Access-Control-Allow-Credentials': 'true', 'Set-Cookie': 'ka_sessionid=6817a347c75399a531148e19cad0aaeb; max-age=2626560; path=/, GCLB=CIGths3--ebbUg; path=/; HttpOnly', 'Transfer-Encoding': 'chunked', 'Vary': 
HTTP response body: b'{"code":403,"message":"You must accept this competition\\u0027s rules before you\\u0027ll be able to download files."}'

This error occours for two reasons:

Your Kaggle account is not verified with your phone number.
Rules for this specific competitions are not accepted.

Lets accept Rules of competition

Click on "I Understand and Accept". After this you will be prompted to verify your account using your phone number:

Add your phone number and Kaggle will send the code to your number, enter this code and verify your account. ( Note: pipeline wont run if your Kaggle account is not verified )

Success

After the kaggle account is verified pipeline run is successful we will get the following:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openvaccine-kaggle-competition

openvaccine-kaggle-competition

Readme.md

Objective

Environment

Step 1: Setup Kubeflow as a Service

Step 2: Launch a Notebook Server

Step 3: Clone the Project Repo to Your Notebook

Step 4: Setup DockerHub and Docker

Step 5: Setup Kaggle

Step 6: Install Git

Step 7: Clone the Project Repo Locally

Step 8: Create a PodDefault Resource

Step 9: Explore the `load-data` directory

Step 10: Build the `load-data` Docker Image

Step 11: Push the `load-data` Docker Image to DockerHub

Step 12: Explore the `preprocess-data` directory

Step 13: Explore the `preprocess-data` directory

Step 14: Push the `preprocess-data` Docker Image to DockerHub

Step 15: Explore the `model-training` directory

Step 16: Build the `model-training` Docker Image

Step 17: Push the `model-training` Docker Image to DockerHub

Step 18: Explore the `model-evaluation` directory

Step 19: Build the `model-evaluation` Docker Image

Step 20: Push the `model-evaluation` Docker Image to DockerHub

Step 21: Modify the openvaccine-kaggle-competiton-kfp.py file

Step 22: Generate a KFP Pipeline yaml File

Step 23: Create an Experiment

Step 24: Create a Pipeline

Step 25: Create a Run

Troubleshooting Tips:

Success

Name		Name	Last commit message	Last commit date
parent directory ..
pipeline-components		pipeline-components
Readme.md		Readme.md
openvaccine-kaggle-competition-kfp.ipynb		openvaccine-kaggle-competition-kfp.ipynb
openvaccine-kaggle-competition-kfp.py		openvaccine-kaggle-competition-kfp.py
openvaccine-kaggle-competition-kfp.yaml		openvaccine-kaggle-competition-kfp.yaml

Files

openvaccine-kaggle-competition

Directory actions

More options

Directory actions

More options

Latest commit

History

openvaccine-kaggle-competition

Folders and files

parent directory

Readme.md

Objective

Environment

Step 1: Setup Kubeflow as a Service

Step 2: Launch a Notebook Server

Step 3: Clone the Project Repo to Your Notebook

Step 4: Setup DockerHub and Docker

Step 5: Setup Kaggle

Step 6: Install Git

Step 7: Clone the Project Repo Locally

Step 8: Create a PodDefault Resource

Step 9: Explore the load-data directory

Step 10: Build the load-data Docker Image

Step 11: Push the load-data Docker Image to DockerHub

Step 12: Explore the preprocess-data directory

Step 13: Explore the preprocess-data directory

Step 14: Push the preprocess-data Docker Image to DockerHub

Step 15: Explore the model-training directory

Step 16: Build the model-training Docker Image

Step 17: Push the model-training Docker Image to DockerHub

Step 18: Explore the model-evaluation directory

Step 19: Build the model-evaluation Docker Image

Step 20: Push the model-evaluation Docker Image to DockerHub

Step 21: Modify the openvaccine-kaggle-competiton-kfp.py file

Step 22: Generate a KFP Pipeline yaml File

Step 23: Create an Experiment

Step 24: Create a Pipeline

Step 25: Create a Run

Troubleshooting Tips:

Success

Step 9: Explore the `load-data` directory

Step 10: Build the `load-data` Docker Image

Step 11: Push the `load-data` Docker Image to DockerHub

Step 12: Explore the `preprocess-data` directory

Step 13: Explore the `preprocess-data` directory

Step 14: Push the `preprocess-data` Docker Image to DockerHub

Step 15: Explore the `model-training` directory

Step 16: Build the `model-training` Docker Image

Step 17: Push the `model-training` Docker Image to DockerHub

Step 18: Explore the `model-evaluation` directory

Step 19: Build the `model-evaluation` Docker Image

Step 20: Push the `model-evaluation` Docker Image to DockerHub