[GH Issue Summarization] Upgrade to kf v0.4.0-rc.2 (kubeflow#450)

* Update tfjob components to v1beta1 Remove old version of tensor2tensor component * Combine UI into a single jsonnet file * Upgrade GH issue summarization to kf v0.4.0-rc.2 Use latest ksonnet v0.13.1 Use latest seldon v1alpha2 Remove ksonnet app with full kubeflow platform & replace with components specific to this example. Remove outdated scripts Add cluster creation links to Click-to-deploy & kfctl Add warning not to use the Training with an Estimator guide Replace commandline with bash for better syntax highlighting Replace messy port-forwarding commands with svc/ambassador Add modelUrl param to ui component Modify teardown instructions to remove the deployment Fix grammatical mistakes * Rearrange tfjob instructions
kubefwk · Dec 31, 2018 · 70a22d6 · 70a22d6
1 parent 7990408
commit 70a22d6
Show file tree

Hide file tree

Showing 107 changed files with 385 additions and 86,534 deletions.
diff --git a/github_issue_summarization/01_setup_a_kubeflow_cluster.md b/github_issue_summarization/01_setup_a_kubeflow_cluster.md
@@ -1,45 +1,52 @@
 # Setup Kubeflow
 
-In this part, you will setup kubeflow on an existing kubernetes cluster.
+In this section, you will setup Kubeflow on an existing Kubernetes cluster.
 
 ## Requirements
 
-*   A kubernetes cluster
-    * To create a managed cluster run 
-        ```commandline
-        gcloud container clusters create kubeflow-examples-cluster
-        ```
-        or use kubeadm: [docs](https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/)
-*   `kubectl` CLI (command line interface) pointing to the kubernetes cluster
+*   A Kubernetes cluster
+    * To create a cluster, follow the instructions on the
+      [Set up Kubernetes](https://www.kubeflow.org/docs/started/getting-started/#set-up-kubernetes)
+      section of the Kubeflow Getting Started guide. We recommend using a
+      managed service such as Google Kubernetes Engine (GKE).
+      [This link](https://www.kubeflow.org/docs/started/getting-started-gke/)
+      guides you through the process of using either
+      [Click-to-Deploy](https://deploy.kubeflow.cloud/#/deploy) (a web-based UI) or
+      [`kfctl`](https://github.com/kubeflow/kubeflow/blob/master/scripts/kfctl.sh)
+      (a CLI tool) to generate a GKE cluster with all Kubeflow components
+      installed. Note that there is no need to complete the Deploy Kubeflow steps
+      below if you use either of these two tools.
+*   The Kubernetes CLI `kubectl` pointing to the kubernetes cluster
     *   Make sure that you can run `kubectl get nodes` from your terminal
         successfully
-*   The ksonnet CLI, v0.9.2 or higher: [ks](https://ksonnet.io/#get-started)
+*   The ksonnet CLI [`ks`](https://ksonnet.io/#get-started), v0.9.2 or higher:
     * In case you want to install a particular version of ksonnet, you can run
 
-        ```commandline
-        export KS_VER=ks_0.11.0_linux_amd64
-        wget -O /tmp/$KS_VER.tar.gz https://github.com/ksonnet/ksonnet/releases/download/v0.11.0/$KS_VER.tar.gz
+        ```bash
+        export KS_VER=0.13.1
+        export KS_BIN=ks_${KS_VER}_linux_amd64
+        wget -O /tmp/${KS_BIN}.tar.gz https://github.com/ksonnet/ksonnet/releases/download/v${KS_VER}/${KS_BIN}.tar.gz
         mkdir -p ${HOME}/bin
-        tar -xvf /tmp/$KS_VER.tar.gz -C ${HOME}/bin
-        export PATH=$PATH:${HOME}/bin/$KS_VER
+        tar -xvf /tmp/${KS_BIN}.tar.gz -C ${HOME}/bin
+        export PATH=$PATH:${HOME}/bin/${KS_BIN}
         ```
+
 ## Kubeflow setup
 
-Refer to the [
-guide](https://www.kubeflow.org/docs/started/getting-started/) for
-detailed instructions on how to setup kubeflow on your kubernetes cluster.
+Refer to the [guide](https://www.kubeflow.org/docs/started/getting-started/) for
+detailed instructions on how to setup Kubeflow on your Kubernetes cluster.
 Specifically, complete the following sections:
 
-* [Deploy
-Kubeflow](https://www.kubeflow.org/docs/started/getting-started/)
-    * The [ks-kubeflow](https://github.com/kubeflow/examples/tree/master/github_issue_summarization/ks-kubeflow) 
-	directory can be used instead of creating a ksonnet app from scratch.
-    
-    * If you run into 
-        [API rate limiting errors](https://github.com/ksonnet/ksonnet/blob/master/docs/troubleshooting.md#github-rate-limiting-errors), ensure you have a `${GITHUB_TOKEN}` environment variable set.
-    
-    * If you run into [RBAC permissions issues](https://github.com/kubeflow/kubeflow/blob/master/user_guide.md#rbac-clusters)
-        running `ks apply` commands, be sure you have created a `cluster-admin` ClusterRoleBinding for your username.
+* [Deploy Kubeflow](https://www.kubeflow.org/docs/started/getting-started/)
+    * The latest version that was tested with this walkthrough was v0.4.0-rc.2.
+    * The [`kfctl`](https://github.com/kubeflow/kubeflow/blob/master/scripts/kfctl.sh)
+      CLI tool can be used to install Kubeflow on an existing cluster. Follow
+      [this guide](https://www.kubeflow.org/docs/started/getting-started/#kubeflow-quick-start)
+      to use `kfctl` to generate a ksonnet app, create Kubeflow manifests, and
+      install all default components onto an existing Kubernetes cluster. Note
+      that you can likely skip this step if you used
+      [Click-to-Deploy](https://deploy.kubeflow.cloud/#/deploy)
+      or `kfctl` to generate your cluster.
 
 * [Setup a persistent disk](https://www.kubeflow.org/docs/guides/advanced/)
 
@@ -49,9 +56,9 @@ Kubeflow](https://www.kubeflow.org/docs/started/getting-started/)
     * For this example, provision a `10GB` cluster-wide shared NFS mount with the
       name `github-issues-data`.
 
-    * After the NFS is ready, delete the `tf-hub-0` pod so that it gets recreated and
+    * After the NFS is ready, delete the `jupyter-0` pod so that it gets recreated and
       picks up the NFS mount. You can delete it by running `kubectl delete pod
-      tf-hub-0 -n=${NAMESPACE}`
+      jupyter-0 -n=${NAMESPACE}`
 
 * [Bringing up a
 Notebook](https://www.kubeflow.org/docs/guides/components/jupyter/)
@@ -62,19 +69,44 @@ Notebook](https://www.kubeflow.org/docs/guides/components/jupyter/)
 
 After completing that, you should have the following ready:
 
-* A ksonnet app in a directory named `ks-kubeflow`
-* An output similar to this for `kubectl get pods` command
-
-```commandline
-NAME                                   READY     STATUS              RESTARTS   AGE
-ambassador-75bb54594-dnxsd             2/2       Running             0          3m
-ambassador-75bb54594-hjj6m             2/2       Running             0          3m
-ambassador-75bb54594-z948h             2/2       Running             0          3m
-jupyter-chasm                          1/1       Running             0          49s
-spartakus-volunteer-565b99cd69-knjf2   1/1       Running             0          3m
-tf-hub-0                               1/1       Running             0          3m
-tf-job-dashboard-6c757d8684-d299l      1/1       Running             0          3m
-tf-job-operator-77776c8446-lpprm       1/1       Running             0          3m
+* A ksonnet app in a directory named `ks_app`
+* An output similar to this for `kubectl -n kubeflow get pods` command
+
+```bash
+NAME                                                      READY     STATUS         RESTARTS   AGE
+ambassador-5cf8cd97d5-6qlpz                               1/1       Running        0          3m
+ambassador-5cf8cd97d5-rqzkx                               1/1       Running        0          3m
+ambassador-5cf8cd97d5-wz9hl                               1/1       Running        0          3m
+argo-ui-7c9c69d464-xpphz                                  1/1       Running        0          3m
+centraldashboard-6f47d694bd-7jfmw                         1/1       Running        0          3m
+cert-manager-5cb7b9fb67-qjd9p                             1/1       Running        0          3m
+cm-acme-http-solver-2jr47                                 1/1       Running        0          3m
+ingress-bootstrap-x6whr                                   1/1       Running        0          3m
+jupyter-0                                                 1/1       Running        0          3m
+jupyter-chasm                                             1/1       Running        0          49s
+katib-ui-54b4667bc6-cg4jk                                 1/1       Running        0          3m
+metacontroller-0                                          1/1       Running        0          3m
+minio-7bfcc6c7b9-qrshc                                    1/1       Running        0          3m
+ml-pipeline-b59b58dd6-bwm8t                               1/1       Running        0          3m
+ml-pipeline-persistenceagent-9ff99498c-v4k8f              1/1       Running        0          3m
+ml-pipeline-scheduledworkflow-78794fd86f-4tzxp            1/1       Running        0          3m
+ml-pipeline-ui-9884fd997-7jkdk                            1/1       Running        0          3m
+ml-pipelines-load-samples-668gj                           0/1       Completed      0          3m
+mysql-6f6b5f7b64-qgbkz                                    1/1       Running        0          3m
+pytorch-operator-6f87db67b7-nld5h                         1/1       Running        0          3m
+spartakus-volunteer-7c77dc796-7jgtd                       1/1       Running        0          3m
+studyjob-controller-68c6fc5bc8-jkc9q                      1/1       Running        0          3m
+tf-job-dashboard-5f986cf99d-kb6gp                         1/1       Running        0          3m
+tf-job-operator-v1beta1-5876c48976-q96nh                  1/1       Running        0          3m
+vizier-core-78f57695d6-5t8z7                              1/1       Running        0          3m
+vizier-core-rest-7d7dd7dbb8-dbr7n                         1/1       Running        0          3m
+vizier-db-777675b958-c46qh                                1/1       Running        0          3m
+vizier-suggestion-bayesianoptimization-7f46d8cb47-wlltt   1/1       Running        0          3m
+vizier-suggestion-grid-64c5f8bdf-2bznv                    1/1       Running        0          3m
+vizier-suggestion-hyperband-8546bf5885-54hr6              1/1       Running        0          3m
+vizier-suggestion-random-c4c8d8667-l96vs                  1/1       Running        0          3m
+whoami-app-7b575b555d-85nb8                               1/1       Running        0          3m
+workflow-controller-5c95f95f58-hprd5                      1/1       Running        0          3m
 ```
 
 *   A Jupyter Notebook accessible at http://127.0.0.1:8000
@@ -83,10 +115,14 @@ tf-job-operator-77776c8446-lpprm       1/1       Running             0
 
 ## Summary
 
-*   We created a ksonnet app for our kubeflow deployment
-*   We deployed the kubeflow-core component to our kubernetes cluster
-*   We created a disk for storing our training data
-*   We connected to JupyterHub and spawned a new Jupyter notebook
-*   For additional details and self-paced learning scenarios check `Resources` section of the [getting started guide](https://www.kubeflow.org/docs/started/getting-started/)
-
-*Next*: [Training the model](02_training_the_model.md)
+*   We created a ksonnet app for our kubeflow deployment: `ks_app`.
+*   We deployed the default Kubeflow components to our Kubernetes cluster.
+*   We created a disk for storing our training data.
+*   We connected to JupyterHub and spawned a new Jupyter notebook.
+*   For additional details and self-paced learning scenarios related to this
+    example, see the
+    [Resources](https://www.kubeflow.org/docs/started/getting-started/#resources)
+    section of the
+    [Getting Started Guide](https://www.kubeflow.org/docs/started/getting-started/).
+
+*Next*: [Training the model with a notebook](02_training_the_model.md)
diff --git a/github_issue_summarization/02_distributed_training.md b/github_issue_summarization/02_distributed_training.md
@@ -1,23 +1,26 @@
 # Distributed training using Estimator
 
-Distributed training with keras currently doesn't work; see
+Distributed training with Keras currently does not work. Do not follow this guide
+until these issues have been resolved:
 
-* kubeflow/examples#280
-* kubeflow/examples#96
+* [kubeflow/examples#280](https://github.com/kubeflow/examples/issues/280)
+* [kubeflow/examples#196](https://github.com/kubeflow/examples/issues/196)
 
-Requires Tensorflow 1.9 or later.
+Requires TensorFlow 1.9 or later.
 Requires [StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/) capable of creating ReadWriteMany persistent volumes.
 
 On GKE you can follow [GCFS documentation](https://master.kubeflow.org/docs/started/getting-started-gke/#using-gcfs-with-kubeflow) to enable it.
 
-Estimator and Keras are both part of Tensorflow. These high level APIs are designed
-to make building models easier. In our distributed training example we will show how both
+Estimator and Keras are both part of TensorFlow. These high-level APIs are designed
+to make building models easier. In our distributed training example, we will show how both
 APIs work together to help build models that will be trainable in both single node and
 distributed manner.
 
 ## Keras and Estimators
 
-Code required to run this example can be found in [distributed](https://github.com/kubeflow/examples/tree/master/github_issue_summarization/distributed) directory.
+Code required to run this example can be found in the
+[distributed](https://github.com/kubeflow/examples/tree/master/github_issue_summarization/distributed)
+directory.
 
 You can read more about Estimators [here](https://www.tensorflow.org/guide/estimators).
 In our example we will leverage `model_to_estimator` function that allows to turn existing tf.keras model to estimator, and therefore allow it to
@@ -93,3 +96,7 @@ tool for us. Please refer to [documentation](https://www.tensorflow.org/guide/pr
 ## Model
 
 After training is complete, our model can be found in "model" PVC.
+
+*Next*: [Serving the model](03_serving_the_model.md)
+
+*Back*: [Setup a kubeflow cluster](01_setup_a_kubeflow_cluster.md)
diff --git a/github_issue_summarization/02_training_the_model.md b/github_issue_summarization/02_training_the_model.md
@@ -1,14 +1,14 @@
-# Training the model
+# Training the model with a notebook
 
-By this point, you should have a Jupyter Notebook running at http://127.0.0.1:8000.
+By this point, you should have a Jupyter notebook running at http://127.0.0.1:8000.
 
 ## Download training files
 
-Open the Jupyter Notebook interface and create a new Terminal by clicking on
-menu, *New -> Terminal*. In the Terminal, clone this git repo by executing: `
+Open the Jupyter notebook interface and create a new Terminal by clicking on
+menu, *New -> Terminal*. In the Terminal, clone this git repo by executing:
 
-```commandline
-git clone https://github.com/kubeflow/examples.git`
+```bash
+git clone https://github.com/kubeflow/examples.git
 ```
 
 Now you should have all the code required to complete training in the `examples/github_issue_summarization/notebooks` folder. Navigate to this folder.
@@ -19,7 +19,7 @@ Here you should see two files:
 
 ## Perform training
 
-Open th `Training.ipynb` notebook. This contains a complete walk-through of
+Open the `Training.ipynb` notebook. This contains a complete walk-through of
 downloading the training data, preprocessing it, and training it.
 
 Run the `Training.ipynb` notebook, viewing the output at each step to confirm
@@ -44,9 +44,9 @@ kubectl --namespace=${NAMESPACE} cp ${PODNAME}:/home/jovyan/examples/github_issu
 kubectl --namespace=${NAMESPACE} cp ${PODNAME}:/home/jovyan/examples/github_issue_summarization/notebooks/title_pp.dpkl .
 ```
 
-For information on:
+_(Optional)_ You can also perform training with two alternate methods:
 - [Training the model using TFJob](02_training_the_model_tfjob.md)
-- [Distributed training using tensor2tensor](02_tensor2tensor_training.md)
+- [Distributed training using Estimator](02_distributed_training.md)
 
 *Next*: [Serving the model](03_serving_the_model.md)
 

diff --git a/github_issue_summarization/02_training_the_model_tfjob.md b/github_issue_summarization/02_training_the_model_tfjob.md
@@ -1,32 +1,35 @@
 # Training the model using TFJob
 
-Kubeflow offers a TensorFlow job controller for kubernetes. This allows you to run your distributed Tensorflow training
-job on a kubernetes cluster. For this training job, we will read our training data from GCS and write our output model
+Kubeflow offers a TensorFlow job controller for Kubernetes. This allows you to run your distributed Tensorflow training
+job on a Kubernetes cluster. For this training job, we will read our training
+data from Google Cloud Storage (GCS) and write our output model
 back to GCS.
 
 ## Create the image for training
 
-The [notebooks](notebooks) directory contains the necessary files to create a image for training. The [train.py](notebooks/train.py) file contains the training code. Here is how you can create an image and push it to gcr.
+The [notebooks](notebooks) directory contains the necessary files to create an
+image for training. The [train.py](notebooks/train.py) file contains the
+training code. Here is how you can create an image and push it to Google
+Container Registry (GCR):
 
-```commandline
+```bash
 cd notebooks/
 make PROJECT=${PROJECT} set-image
 ```
 ## Train Using PVC
 
-If you don't have access to GCS or don't want to use GCS you
-can use a persistent volume to store the data and model.
+If you don't have access to GCS or do not wish to use GCS, you
+can use a Persistent Volume Claim (PVC) to store the data and model.
 
-Create a pvc
+Note: your cluster must have a default storage class defined for this to work.
+Create a PVC:
 
 ```
 ks apply --env=${KF_ENV} -c data-pvc
 ```
-
-	* Your cluster must have a default storage class defined for
-	  this to work.
 
-Run the job to download the data to the PVC.
+
+Run the job to download the data to the PVC:
 
 ```
 ks apply --env=${KF_ENV} -c data-downloader
@@ -38,24 +41,24 @@ Submit the training job
 ks apply --env=${KF_ENV} -c tfjob-pvc
 ```
 
-The resulting model will be stored on PVC so to access it you will
-need to run a pod and attach the PVC. For serving you can just
-attach it the pod serving the model.
+The resulting model will be stored on the PVC, so to access it you will
+need to run a pod and attach the PVC. For serving, you can just
+attach it to the pod serving the model.
 
 ## Training Using GCS
 
-If you are running on GCS you can train using GCS to store the input
+If you are using GCS, you can train using GCS to store the input
 and the resulting model.
 
-### GCS Service account
+### GCS service account
 
-* Create a service account which will be used to read and write data from the GCS Bucket.
+* Create a service account that will be used to read and write data from the GCS bucket.
 
-* Give the storage account `roles/storage.admin` role so that it can access GCS Buckets.
+* Give the storage account `roles/storage.admin` role so that it can access GCS buckets.
 
 * Download its key as a json file and create a secret named `user-gcp-sa` with the key `user-gcp-sa.json`
 
-```commandline
+```bash
 SERVICE_ACCOUNT=github-issue-summarization
 PROJECT=kubeflow-example-project # The GCP Project name
 gcloud iam service-accounts --project=${PROJECT} create ${SERVICE_ACCOUNT} \
@@ -74,12 +77,12 @@ kubectl --namespace=${NAMESPACE} create secret generic user-gcp-sa --from-file=u
 
 ### Run the TFJob using your image
 
-[ks-kubeflow](ks-kubeflow) contains a ksonnet app to deploy the TFJob.
+[ks_app](ks_app) contains a ksonnet app to deploy the TFJob.
 
-Set the appropriate params for the tfjob component
+Set the appropriate params for the tfjob component:
 
-```commandline
-cd ks-kubeflow
+```bash
+cd ks_app
 ks param set tfjob namespace ${NAMESPACE} --env=${KF_ENV}
 
 # The image pushed in the previous step
@@ -97,30 +100,31 @@ ks param set tfjob output_model_gcs_path "github-issue-summarization-data/output
 
 Deploy the app:
 
-```commandline
+```bash
 ks apply ${KF_ENV} -c tfjob
 ```
 
 In a while you should see a new pod with the label `tf_job_name=tf-job-issue-summarization`
-```commandline
-kubectl get pods -n=${NAMESPACE} -ltf_job_name=tf-job-issue-summarization
+```bash
+kubectl get pods -n=${NAMESPACE} tfjob-issue-summarization-master-0
 ```
 
-You can view the logs of the tf-job operator using
+You can view the training logs using
 
-```commandline
-kubectl logs -f $(kubectl get pods -n=${NAMESPACE} -lname=tf-job-operator -o=jsonpath='{.items[0].metadata.name}')
+```bash
+kubectl logs -f -n=${NAMESPACE} tfjob-issue-summarization-master-0
 ```
 
-You can view the actual training logs using
+You can view the logs of the tf-job operator using
 
-```commandline
-kubectl logs -f $(kubectl get pods -n=${NAMESPACE} -ltf_job_name=tf-job-issue-summarization -o=jsonpath='{.items[0].metadata.name}')
+```bash
+kubectl logs -f -n=${NAMESPACE} $(kubectl get pods -n=${NAMESPACE} -lname=tf-job-operator -o=jsonpath='{.items[0].metadata.name}')
 ```
 
-For information on:
-- [Training the model](02_training_the_model.md)
-- [Distributed training using tensor2tensor](02_tensor2tensor_training.md)
+
+_(Optional)_ You can also perform training with two alternate methods:
+- [Training the model with a notebook](02_training_the_model.md)
+- [Distributed training using Estimator](02_distributed_training.md)
 
 *Next*: [Serving the model](03_serving_the_model.md)