digit-recognition-kaggle-competition

Objective

Here we convert the https://www.kaggle.com/competitions/digit-recognizer code to a Kubeflow pipeline The objective of this task is to correctly identify digits from a dataset of tens of thousands of handwritten images.

Testing Environment

Environment:

Name	version
Kubeflow	v1.4
kfp	1.8.11
kubeflow-kale	0.6.0
pip	21.3.1

The KFP version used for testing can be installed as pip install kfp==1.8.11

Section 1: KFP Pipeline

Kubeflow lightweight component method

Here, a python function is created to carry out a certain task and the python function is passed inside a kfp component methodcreate_component_from_func.

Kubeflow pipelines

A Kubeflow pipelines connects all components together, to create a directed acyclic graph (DAG). The kfp dsl.pipeline method was used to create a pipeline function. The kfp component method InputPath and OutputPath was used to pass data amongst component.

Finally, the create_run_from_pipeline_func was used to submit pipeline directly from pipeline function

To create pipeline on KFP

Open your Kubeflow Cluster, create a Notebook Server and connect to it.
Clone this repo and navigate to this directory
Navigate to data directory, download the compressed kaggle data using this link, store the training.zip, test.zip and sample_sumbission.csv files in the data folder
Run the digit-recognizer-kfp notebook from start to finish
View run details immediately after submitting pipeline.

View Pipeline

Section 2: Kale Pipeline

To create pipeline using the Kale JupyterLab extension

Clone GitHub repo and navigate to this directory
Install the requirements.txt file
Launch the digit-recognizer-kale.ipynb Notebook
Enable the Kale extension in JupyterLab
The notebook's cells are automatically annotated with Kale tags

With the use of Kale tags we define the following:
- Pipeline parameters are assigned using the "pipeline parameters" tag
- The necessary libraries that need to be used throughout the Pipeline are passed through the "imports" tag
- Notebook cells are assigned to specific Pipeline components (download data, load data, etc.) using the "pipeline step" tag
- Cell dependencies are defined between the different pipeline steps with the "depends on" flag
Compile and run Notebook using Kale

Name		Name	Last commit message	Last commit date
parent directory ..
data		data
images		images
README.md		README.md
digit-recognizer-kale.ipynb		digit-recognizer-kale.ipynb
digit-recognizer-kfp.ipynb		digit-recognizer-kfp.ipynb
digit-recognizer-orig.ipynb		digit-recognizer-orig.ipynb
digit_recognizer_orig.ipynb		digit_recognizer_orig.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

digit-recognition-kaggle-competition

digit-recognition-kaggle-competition

README.md

Objective

Testing Environment

Section 1: KFP Pipeline

Kubeflow lightweight component method

Kubeflow pipelines

To create pipeline on KFP

View Pipeline

Section 2: Kale Pipeline

View Pipeline

Files

digit-recognition-kaggle-competition

Directory actions

More options

Directory actions

More options

Latest commit

History

digit-recognition-kaggle-competition

Folders and files

parent directory

README.md

Objective

Testing Environment

Section 1: KFP Pipeline

Kubeflow lightweight component method

Kubeflow pipelines

To create pipeline on KFP

View Pipeline

Section 2: Kale Pipeline

View Pipeline