This example is based on the Titanic Kaggle competition (https://www.kaggle.com/c/titanic). The objective of this exercise is to use machine learning to create a model that predicts which passengers survived the Titanic shipwreck.
This pipeline was tested using Kubeflow 1.4 and kfp 1.1.2 and x86-64 and ARM based system which includes all Intel and AMD based CPU's and M1/M2 series Macbooks
- If you haven’t already, sign up (https://www.arrikto.com/kubeflow-as-a-service/)
- Deploy Kubeflow
- Default should work
- (Kubeflow as a Service) Open up a terminal in the Notebook Server and git clone the
kubeflow/examples
repository
git clone https://github.com/kubeflow/examples
- If you haven’t already, sign up (https://hub.docker.com/) for DockerHub
- If you haven’t already, install Docker Desktop (https://www.docker.com/products/docker-desktop/) locally OR install the Docker command line utility (https://docs.docker.com/get-docker/) and enter
sudo docker login
command in your terminal and log into Docker with your your DockerHub username and password
- If you haven’t already done so, sign up (https://www.kaggle.com/) for Kaggle
- (On Kaggle) Generate an API token (https://www.kaggle.com/docs/api)
- (Kubeflow as a Service) Create a Kubernetes secret
kubectl create secret generic kaggle-secret --from-literal=KAGGLE_USERNAME=<username> --from-literal=KAGGLE_KEY=<api_token>
- (Locally) If you don’t have it already, install Git
- (Locally) Git clone the kubeflow/examples repository
git clone https://github.com/kubeflow/examples
- (Kubeflow as a Service) Navigate to the
titanic-kaggle-competition
directory - Create a
resource.yaml
file
resource.yaml:
apiVersion: "kubeflow.org/v1alpha1"
kind: PodDefault
metadata:
name: kaggle-access
spec:
selector:
matchLabels:
kaggle-secret: "true"
desc: "kaggle-access"
volumeMounts:
- name: secret-volume
mountPath: /secret/kaggle
volumes:
- name: secret-volume
secret:
secretName: kaggle-secret
- Apply the resource.yaml file:
kubectl apply -f resource.yaml
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/pre-process
directory - Open up the
preprocess.py
file - Note the code in this file that will perform the actions required in the “preprocess-data” pipeline step
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/pre-process
directory - Build the Docker image if locally you are using arm64 (Apple M1)
docker build --platform=linux/amd64 -t <docker_username>/<docker_imagename>:<tag>-amd64 .
- OR build the Docker image if locally you are using amd64
docker build -t <docker_username>/<docker_imagename>:<tag> .
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/load-data
directory - Push the Docker image if locally you are using arm64 (Apple M1)
docker push <docker_username>/<docker_imagename>:<tag>-amd64
- OR build the Docker image if locally you are using amd64
docker push <docker_username>/<docker_imagename>:<tag>
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/featureengineering
directory - Open up the
featureengg.py
file - Note the code in this file that will perform the actions required in the “featureengineering” pipeline step
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/featureengineering
directory - Build the Docker image if locally you are using arm64 (Apple M1)
docker build --platform=linux/amd64 -t <docker_username>/<docker_imagename>:<tag>-amd64 .
- OR build the Docker image if locally you are using amd64
docker build -t <docker_username>/<docker_imagename>:<tag> .
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/featureengineering
directory - Push the Docker image if locally you are using arm64 (Apple M1)
docker push <docker_username>/<docker_imagename>:<tag>-amd64
- OR build the Docker image if locally you are using amd64
docker push <docker_username>/<docker_imagename>:<tag>
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/decisiontree
directory - Open up the
decisiontree.py
file - Note the code in this file that will perform the actions required in the “decision-tree” pipeline step
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/decisiontree
directory Build the Docker image if locally you are using arm64 (Apple M1)
docker build --platform=linux/amd64 -t <docker_username>/<docker_imagename>:<tag>-amd64 .
- OR build the Docker image if locally you are using amd64
docker build -t <docker_username>/<docker_imagename>:<tag> .
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/decisiontree
directory - Push the Docker image if locally you are using arm64 (Apple M1)
docker push <docker_username>/<docker_imagename>:<tag>-amd64
- OR build the Docker image if locally you are using amd64
docker push <docker_username>/<docker_imagename>:<tag>
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/logisticregression
directory - Open up the
regression.py
file - Note the code in this file that will perform the actions required in the “regression” pipeline step
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/logisticregression
directory - Build the Docker image if locally you are using arm64 (Apple M1)
docker build --platform=linux/amd64 -t <docker_username>/<docker_imagename>:<tag>-amd64 .
- OR build the Docker image if locally you are using amd64
docker build -t <docker_username>/<docker_imagename>:<tag> .
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/logisticregression
directory - Push the Docker image if locally you are using arm64 (Apple M1)
docker push <docker_username>/<docker_imagename>:<tag>-amd64
- OR build the Docker image if locally you are using amd64
docker push <docker_username>/<docker_imagename>:<tag>
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/naivebayes
directory - Open up the
naivebayes.py
file Note the code in this file that will perform the actions required in the “bayes” pipeline step
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/naivebayes
directory - Build the Docker image if locally you are using arm64 (Apple M1)
docker build --platform=linux/amd64 -t <docker_username>/<docker_imagename>:<tag>-amd64 .
- OR build the Docker image if locally you are using amd64
docker build -t <docker_username>/<docker_imagename>:<tag> .
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/naivebayes
directory - Push the Docker image if locally you are using arm64 (Apple M1)
docker push <docker_username>/<docker_imagename>:<tag>-amd64
- OR build the Docker image if locally you are using amd64
docker push <docker_username>/<docker_imagename>:<tag>
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/randomforest
directory - Open up the
randomforest.py
file - Note the code in this file that will perform the actions required in the “random-forest” pipeline step
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/randomforest
directory - Build the Docker image if locally you are using arm64 (Apple M1)
docker build --platform=linux/amd64 -t <docker_username>/<docker_imagename>:<tag>-amd64 .
- OR build the Docker image if locally you are using amd64
docker build -t <docker_username>/<docker_imagename>:<tag> .
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/randomforest
directory - Push the Docker image if locally you are using arm64 (Apple M1)
docker push <docker_username>/<docker_imagename>:<tag>-amd64
- OR build the Docker image if locally you are using amd64
docker push <docker_username>/<docker_imagename>:<tag>
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/svm
directory - Open up the
svm.py
file - Note the code in this file that will perform the actions required in the “svm” pipeline step
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/svm
directory - Build the Docker image if locally you are using arm64 (Apple M1)
docker build --platform=linux/amd64 -t <docker_username>/<docker_imagename>:<tag>-amd64 .
- OR build the Docker image if locally you are using amd64
docker build -t <docker_username>/<docker_imagename>:<tag> .
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/svm
directory - Push the Docker image if locally you are using arm64 (Apple M1)
docker push <docker_username>/<docker_imagename>:<tag>-amd64
- OR build the Docker image if locally you are using amd64
docker push <docker_username>/<docker_imagename>:<tag>
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/results
directory - Open up the
result.py
file - Note the code in this file that will perform the actions required in the “results” pipeline step
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/results
directory - Build the Docker image if locally you are using arm64 (Apple M1)
docker build --platform=linux/amd64 -t <docker_username>/<docker_imagename>:<tag>-amd64 .
- OR build the Docker image if locally you are using amd64
docker build -t <docker_username>/<docker_imagename>:<tag> .
- (Locally) Navigate to the
titanic-kaggle-competition/pipeline-components/results
directory - Push the Docker image if locally you are using arm64 (Apple M1)
docker push <docker_username>/<docker_imagename>:<tag>-amd64
- OR build the Docker image if locally you are using amd64
docker push <docker_username>/<docker_imagename>:<tag>
- (Kubeflow as a Service) Navigate to the
titanic-kaggle-competition
directory - Update the
titanic-kfp.py
with accurate Docker Image inputs
return dsl.ContainerOp(
name = 'Preprocess Data',
image = '<dockerhub username>/<image name>:<tag>',
—-----
return dsl.ContainerOp(
name='featureengineering',
image = '<dockerhub username>/<image name>:<tag>',
—-----
return dsl.ContainerOp(
name='regression',
image = '<dockerhub username>/<image name>:<tag>',
—-----
return dsl.ContainerOp(
name='bayes',
image = '<dockerhub username>/<image name>:<tag>',
—-----
return dsl.ContainerOp(
name='random_forest',
image = '<dockerhub username>/<image name>:<tag>',
—-----
return dsl.ContainerOp(
name='decision_tree',
image = '<dockerhub username>/<image name>:<tag>',
—-----
return dsl.ContainerOp(
name='svm',
image = '<dockerhub username>/<image name>:<tag>',
—-----
return dsl.ContainerOp(
name='results',
image = '<dockerhub username>/<image name>:<tag>',
- (Kubeflow as a Service) Navigate to the
titanic-kaggle-competition
directory Build a python virtual environment:
Step a) Update pip
python3 -m pip install --upgrade pip
Step b) Install virtualenv
sudo pip3 install virtualenv
Step c) Check the installed version of venv
virtualenv --version
Step d) Name your virtual enviornment as kfp
virtualenv kfp
Step e) Activate your venv.
source kfp/bin/activate
After this virtual environment will get activated. Now in our activated venv we need to install following packages:
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install -y git python3-pip
python3 -m pip install kfp==1.1.2
After installing packages create the yaml file
python3 titanic-kaggle-competition-kfp.py
Download the titanic-kaggle-competition-kfp.yaml
file that was created to your local titanic-kaggle-competition
directory.
- (Kubeflow as a Service) Within the Kubeflow Central Dashboard, navigate to the Experiments (KFP) > Create Experiment view
- Name the experiment and click Next
- Click on Experiments (KFP) to view the experiment you just created
- (Kubeflow as a Service) Within the Kubeflow Central Dashboard, navigate to the Pipelines > +Upload Pipeline view
- Name the pipeline
- Click on Upload a file
- Upload the local
titanic-kaggle-competition-kfp.yaml
file - Click Create
- (Kubeflow as a Service) Click on Create Run in the view from the previous step
- Choose the experiment we created in Step 35
- Click Start
- Click on the run name to view the runtime execution graph
While running the pipeline as mentioned above you may come across this error: errorlog:
kaggle.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Date': 'Thu, 23 Jun 2022 11:31:18 GMT', 'Access-Control-Allow-Credentials': 'true', 'Set-Cookie': 'ka_sessionid=6817a347c75399a531148e19cad0aaeb; max-age=2626560; path=/, GCLB=CIGths3--ebbUg; path=/; HttpOnly', 'Transfer-Encoding': 'chunked', 'Vary':
HTTP response body: b'{"code":403,"message":"You must accept this competition\\u0027s rules before you\\u0027ll be able to download files."}'
This error occours for two reasons:
- Your Kaggle account is not verified with your phone number.
- Rules for this specific competitions are not accepted.
A solution to this is please verify your Kaggle account using your phone number and accept the rules for this specific competition, untill these two steps are satisfied pipeline wont accquire data from Kaggle API and it wont run.