Skip to content

Commit

Permalink
Kaggle to kfp (kubeflow#938)
Browse files Browse the repository at this point in the history
* Add files via upload

* Kaggle to kfp

Converted Kaggle notebook of Facial-Keypoint-Detection to Kubeflow pipeline

* Kaggle to kfp
  • Loading branch information
ajinkya933 authored Apr 19, 2022
1 parent 97cb872 commit 7a02695
Show file tree
Hide file tree
Showing 7 changed files with 313 additions and 0 deletions.
43 changes: 43 additions & 0 deletions Facial-Keypoint-Detection/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Objective
Here we convert the https://www.kaggle.com/competitions/facial-keypoints-detection code to kfp-pipeline
The objective of this task is to predict keypoint positions on face images

# Testing enviornment
The pipeline is tested on `Kubeflow 1.4` and `kfp 1.1.2` , it should be compatible with previous releases of Kubeflow . kfp version used for testing is 1.1.2 which can be installed as `pip install kfp==1.1.2`

# Components used

## Docker
Docker is used to create an enviornment to run each component.

## Kubeflow pipelines
Kubeflow pipelines connect each docker component and create a pipeline. Each Kubeflow pipeline is reproducable workflow wherein we pass input arguments and run entire workflow.

# Docker
We start with creating a docker account on dockerhub (https://hub.docker.com/). We signup with our individual email. After signup is compelete login to docker using your username and password using the command `docker login` on your terminal

## Build train image
Navigate to `train` directory, create a folder named `my_data` and put your `training.zip` and `test.zip` data from Kaggle repo in this folder and build docker image using :
```
docker build -t <docker_username>/<docker_imagename>:<tag> .
```
In my case this is:
```
docker build -t hubdocker76/demotrain:v1 .
```

## Build evaluate image
Navigate to eval directory and build docker image using :
```
docker build -t <docker_username>/<docker_imagename>:<tag> .
```
In my case this is:
```
docker build -t hubdocker76/demoeval:v2 .
```
# Kubeflow pipelines

Go to generate-pipeline and run `python3 my_pipeline.py` this will generate a yaml file. which we can upload to Kubeflow pipelines UI and create a Run from it.

# Sample pipeline to run on Kubeflow
Navigate to directory `geneate-pipeline` and run `python3 my_pipeline.py` this will generate yaml file. I have named this yaml as `face_pipeline_01.yaml`. Please upload this pipeline on Kubeflow and start a Run.
14 changes: 14 additions & 0 deletions Facial-Keypoint-Detection/eval/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM "ubuntu:bionic"
RUN apt-get update && yes | apt-get upgrade
RUN mkdir -p /tensorflow/models
RUN apt-get install -y git python3-pip
RUN pip3 install --upgrade pip
RUN pip3 install tensorflow
RUN pip3 install jupyter
RUN pip3 install matplotlib
RUN pip3 install kfp==1.1.2
RUN pip install opencv-python-headless
RUN pip3 install pandas keras
RUN pip3 install sklearn
RUN pip3 install autokeras
COPY . /
28 changes: 28 additions & 0 deletions Facial-Keypoint-Detection/eval/eval.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
from tensorflow.keras.models import load_model
import autokeras as ak
import pandas as pd
import numpy as np

### Load model
loaded_model = load_model("/data/model_autokeras", custom_objects=ak.CUSTOM_OBJECTS)

### Pint model summary
print(loaded_model.summary())

test_dir='/data/test.csv'
test=pd.read_csv(test_dir)

X_test=[]
for img in test['Image']:
X_test.append(np.asarray(img.split(),dtype=float).reshape(96,96,1))
X_test=np.reshape(X_test,(-1,96,96,1))
X_test = np.asarray(X_test).astype('float32')

### predict
y_pred = loaded_model.predict(X_test)

### Create submission file
y_pred= y_pred.reshape(-1,)
submission = pd.DataFrame({'Location': y_pred})
submission.to_csv('/data/submission.csv', index=True , index_label='RowId')

93 changes: 93 additions & 0 deletions Facial-Keypoint-Detection/generate-pipeline/face_pipeline_01.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: face-pipeline-
annotations: {pipelines.kubeflow.org/kfp_sdk_version: 1.1.2, pipelines.kubeflow.org/pipeline_compilation_time: '2022-03-27T11:03:51.876586',
pipelines.kubeflow.org/pipeline_spec: '{"description": "pipeline to detect facial
landmarks", "inputs": [{"name": "trial"}, {"name": "epoch"}, {"name": "patience"}],
"name": "face pipeline"}'}
labels: {pipelines.kubeflow.org/kfp_sdk_version: 1.1.2}
spec:
entrypoint: face-pipeline
templates:
- name: evaluate
container:
command: [python3, eval.py]
image: hubdocker76/demoeval:v2
volumeMounts:
- {mountPath: /data, name: pvc}
inputs:
parameters:
- {name: pvc-name}
volumes:
- name: pvc
persistentVolumeClaim: {claimName: '{{inputs.parameters.pvc-name}}'}
- name: face-pipeline
inputs:
parameters:
- {name: epoch}
- {name: patience}
- {name: trial}
dag:
tasks:
- name: evaluate
template: evaluate
dependencies: [pvc, train]
arguments:
parameters:
- {name: pvc-name, value: '{{tasks.pvc.outputs.parameters.pvc-name}}'}
- {name: pvc, template: pvc}
- name: train
template: train
dependencies: [pvc]
arguments:
parameters:
- {name: epoch, value: '{{inputs.parameters.epoch}}'}
- {name: patience, value: '{{inputs.parameters.patience}}'}
- {name: pvc-name, value: '{{tasks.pvc.outputs.parameters.pvc-name}}'}
- {name: trial, value: '{{inputs.parameters.trial}}'}
- name: pvc
resource:
action: create
manifest: |
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: '{{workflow.name}}-pvc'
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
outputs:
parameters:
- name: pvc-manifest
valueFrom: {jsonPath: '{}'}
- name: pvc-name
valueFrom: {jsonPath: '{.metadata.name}'}
- name: pvc-size
valueFrom: {jsonPath: '{.status.capacity.storage}'}
- name: train
container:
args: [--trial, '{{inputs.parameters.trial}}', --epoch, '{{inputs.parameters.epoch}}',
--patience, '{{inputs.parameters.patience}}']
command: [python3, train.py]
image: hubdocker76/demotrain:v1
volumeMounts:
- {mountPath: /data, name: pvc}
inputs:
parameters:
- {name: epoch}
- {name: patience}
- {name: pvc-name}
- {name: trial}
volumes:
- name: pvc
persistentVolumeClaim: {claimName: '{{inputs.parameters.pvc-name}}'}
arguments:
parameters:
- {name: trial}
- {name: epoch}
- {name: patience}
serviceAccountName: pipeline-runner
42 changes: 42 additions & 0 deletions Facial-Keypoint-Detection/generate-pipeline/my_pipeline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
import kfp
from kfp import dsl

def SendMsg(trial, epoch, patience):
vop = dsl.VolumeOp(name="pvc",
resource_name="pvc", size='1Gi',
modes=dsl.VOLUME_MODE_RWO)

return dsl.ContainerOp(
name = 'Train',
image = 'hubdocker76/demotrain:v1',
command = ['python3', 'train.py'],
arguments=[
'--trial', trial,
'--epoch', epoch,
'--patience', patience
],
pvolumes={
'/data': vop.volume
}
)

def GetMsg(comp1):
return dsl.ContainerOp(
name = 'Evaluate',
image = 'hubdocker76/demoeval:v2',
pvolumes={
'/data': comp1.pvolumes['/data']
},
command = ['python3', 'eval.py']
)

@dsl.pipeline(
name = 'face pipeline',
description = 'pipeline to detect facial landmarks')
def passing_parameter(trial, epoch, patience):
comp1 = SendMsg(trial, epoch, patience)
comp2 = GetMsg(comp1)

if __name__ == '__main__':
import kfp.compiler as compiler
compiler.Compiler().compile(passing_parameter, __file__ + '.yaml')
14 changes: 14 additions & 0 deletions Facial-Keypoint-Detection/train/dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM "ubuntu:bionic"
RUN apt-get update && yes | apt-get upgrade
RUN mkdir -p /tensorflow/models
RUN apt-get install -y git python3-pip
RUN pip3 install --upgrade pip
RUN pip3 install tensorflow
RUN pip3 install jupyter
RUN pip3 install matplotlib
RUN pip3 install kfp==1.1.2
RUN pip install opencv-python-headless
RUN pip3 install pandas keras
RUN pip3 install sklearn
RUN pip3 install autokeras
COPY . /
79 changes: 79 additions & 0 deletions Facial-Keypoint-Detection/train/train.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
import numpy as np
import os
from sklearn.utils import shuffle
import matplotlib.pyplot as plt
import tensorflow as tf
import pandas as pd
from tensorflow.keras.models import load_model
import os
import shutil
import argparse
import autokeras as ak

### Declaring input arguments

parser = argparse.ArgumentParser()
parser.add_argument('--trial', type=int)
parser.add_argument('--epoch', type=int)
parser.add_argument('--patience', type=int)

args = vars(parser.parse_args())

trials = args['trial']
epochs = args['epoch']
patience = args['patience']

project="Facial-keypoints"
run_id= "1.8"
resume_run = True

MAX_TRIALS=trials
EPOCHS=epochs
PATIENCE=patience

### Data Extraction : extract data and save to attached extenal pvc at location /data ###

base_dir='my_data/'
train_dir_zip=base_dir+'training.zip'
test_dir_zip=base_dir+'test.zip'

from zipfile import ZipFile
with ZipFile(train_dir_zip,'r') as zipObj:
zipObj.extractall('/data')
print("Train Archive unzipped")
with ZipFile(test_dir_zip,'r') as zipObj:
zipObj.extractall('/data')
print("Test Archive unzipped")


## Data preprocess

train_dir='/data/training.csv'
test_dir='/data/test.csv'
train=pd.read_csv(train_dir)
test=pd.read_csv(test_dir)

train=train.dropna()
train=train.reset_index(drop=True)

X_train=[]
Y_train=[]

for img in train['Image']:
X_train.append(np.asarray(img.split(),dtype=float).reshape(96,96,1))
X_train=np.reshape(X_train,(-1,96,96,1))
X_train = np.asarray(X_train).astype('float32')

for i in range(len((train))):
Y_train.append(np.asarray(train.iloc[i][0:30].to_numpy()))
Y_train = np.asarray(Y_train).astype('float32')


## Data training

reg = ak.ImageRegressor(max_trials=MAX_TRIALS)
reg.fit(X_train, Y_train, validation_split=0.15, epochs=EPOCHS)

# Export trained model to externally attached pvc
my_model = reg.export_model()
my_model.save('/data/model_autokeras', save_format="tf")

0 comments on commit 7a02695

Please sign in to comment.