This example is based on the Facial Keypoints Detection Kaggle competition. The objective of this exercise is to predict keypoint positions on face images.
This pipeline was tested using Kubeflow 1.4 and kfp 1.1.2 and x86-64 based system which includes all Intel and AMD based CPU's. ARM based systems are not supported.
It is assumed that you have Kubeflow installed.
Docker is used to create an image to run each component in the pipeline.
Kubeflow Pipelines connects each Docker-based component to create a pipeline. Each pipeline is a reproducible workflow wherein we pass input arguments and run the entire workflow.
The input data needed to run this tutorial is been pulled from Kaggle . In order to pull the data we need to create a Kaggle account , user needs to register with his email and password and create a Kaggle username.
Once we have successfully registered our Kaggle account. Now, we have to access the API Token . API access is needed to pull data from Kaggle , to get the API access go to you Kaggle profile and click on your profile picture on the top right we will see this option:
Select “Account” from the menu.
Scroll down to the “API” section and click “Create New API Token” :
This will download a file ‘kaggle.json’ with the following contents :
username “My username”
key “My key”
Now, substitute your “username” for <username>
and your “key” for <api_token>
and create a Kubernetes secret using:
kubectl create secret generic kaggle-secret --from-literal=KAGGLE_USERNAME=<username> --from-literal=KAGGLE_KEY=<api_token>
We need a way to inject common data (env vars, volumes) to pods. In Kubeflow we use PodDefault resource which serves this usecase (reference: https://github.com/kubeflow/kubeflow/blob/master/components/admission-webhook/README.md). Using the PodDefault resource we can attach a secret to our data pulling step container which downloads data using Kaggle API. We create and apply PodDefault resource as follows :
Create a resource.yaml
file with the following code:
apiVersion: "kubeflow.org/v1alpha1"
kind: PodDefault
metadata:
name: kaggle-access
spec:
selector:
matchLabels:
kaggle-secret: "true"
desc: "kaggle-access"
volumeMounts:
- name: secret-volume
mountPath: /secret/kaggle
volumes:
- name: secret-volume
secret:
secretName: kaggle-secret
Apply the yaml with the following command:
kubectl apply -f resource.yaml
Kubeflow relies on Docker images to create pipelines. These images are pushed to a Docker container registry, from which Kubeflow accesses them. For the purposes of this how-to we are going to use Docker Hub as our registry.
Start by creating a Docker account on DockerHub (https://hub.docker.com/). After signing up, Install Docker https://docs.docker.com/get-docker/ and enter docker login
command on your terminal and enter your docker-hub username and password to log into Docker.
Create a new build enviornment which contains Docker installed as highlighted in step1. After this in your new enviornment, on your terminal navigate to the pipeline-components/train/ directory and build the train Docker image using:
$ cd pipeline-components/train/
$ docker build -t <docker_username>/<docker_imagename>:<tag> .
For example:
$ docker build -t hubdocker76/demotrain:v8 .
After building push the image using:
$ docker push hubdocker76/demotrain:v8
Next, on your docker enviornment go to terminal and navigate to the pipeline-components/eval/ directory and build the evaluate Docker image using:
$ cd pipeline-components/eval/
$ docker build -t <docker_username>/<docker_imagename>:<tag> .
For example:
$ docker build -t hubdocker76/demoeval:v3 .
After building push the image using:
$ docker push hubdocker76/demoeval:v3
As a needed step we create a virtual enviornment, that contains all components we need to convert our python code to yaml file.
Steps to build a python virtual enviornment:
Step a) Update pip
python3 -m pip install --upgrade pip
Step b) Install virtualenv
sudo pip3 install virtualenv
Step c) Check the installed version of venv
virtualenv --version
Step d) Name your virtual enviornment as kfp
virtualenv kfp
Step e) Activate your venv.
source kfp/bin/activate
After this virtual environment will get activated. Now in our activated venv we need to install following packages:
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install -y git python3-pip
python3 -m pip install kfp==1.1.2
After installing packages create the yaml file
Inside venv point your terminal to a path which contains our kfp file to build pipeline (facial-keypoints-detection-kfp.py) and run these commands:
$ python3 facial-keypoints-detection-kfp.py
…this will generate a yaml file:
facial-keypoints-detection-kfp.py.yaml
This facial-keypoints-detection-kfp.py.yaml
file can then be uploaded to Kubeflow Pipelines UI from which you can create a Pipeline Run. The same yaml file will also be generated if we run the facial-keypoints-detection-kfp.ipynb notebook in the Notebook Server UI.
Create Run by selecting pipeline and give this run a name. Enter integer values for trial epoch and patience:
Ideal values for trial epoch and patience are: trial=5, epoch=8, patience=3
To run this pipeline using the Kale JupyterLab extension, upload the facial-keypoints-detection-kale.ipynb
file to your Kubeflow deployment where Kale is enabled. Once uploaded as a necessary step run the cell annotated with skip
tag with function download_kaggle_dataset
:
This downloads the data using Kaggle API (use <username>
and <api_token>
as highlighted in Apply PodDefault resource
step to get data using Kaggle API) this saves the download data to my_data
folder.
Only after the data is downloaded then click “compile and run” to create a pipeline run.