Skip to content

Commit

Permalink
Update the update_index.sh (kubeflow#373)
Browse files Browse the repository at this point in the history
* add search index creator container

* add pipeline

* update op name

* update readme

* update scripts

* typo fix

* Update Makefile

* Update Makefile

* address comments

* fix ks

* update pipeline

* restructure the images

* remove echo

* update image

* add code embedding launcher

* small fixes

* format

* format

* address comments

* add flag

* Update arguments.py

* update parameter

* revert to use --wait_until_finished. --wait_until_finish never works

* update image

* update git script

* update script

* update readme
  • Loading branch information
IronPan authored and k8s-ci-robot committed Nov 29, 2018
1 parent 6855802 commit 3799bac
Show file tree
Hide file tree
Showing 3 changed files with 52 additions and 25 deletions.
4 changes: 2 additions & 2 deletions code_search/docker/ks/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM ubuntu:xenial
FROM debian:7

RUN apt-get update && apt-get install -y wget &&\
RUN apt-get update && apt-get install -y wget ca-certificates git-core &&\
rm -rf /var/lib/apt/lists/*

RUN wget -O /tmp/hub-linux-amd64-2.6.0.tgz https://github.com/github/hub/releases/download/v2.6.0/hub-linux-amd64-2.6.0.tgz && \
Expand Down
37 changes: 17 additions & 20 deletions code_search/docker/ks/update_index.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,20 @@
# This script creates a PR updating the nmslib index used by search-index-server.
# It uses ks CLI to update the parameters.
# After creating and pushing a commit it uses the hub github CLI to create a PR.
#
# The argument --base can be used to change the owner/org of the repo the PR is opened on.
# To use the main kubeflow/examples repo use
# --base=kubeflow:master
#
# To use user alex's fork use
# --base=alex/master
set -ex

DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" > /dev/null && pwd)"

branch=master

usage() {
echo "Usage: update_index.sh --base=OWNER:branch --appDir=<ksonnet app dir> --env=<ksonnet environment> --indexFile=<index file> --lookupFile=<lookup file>"
echo "Usage: update_index.sh --branch=<base branch> --appDir=<ksonnet app dir>
--gitRepo=<github repo with Argo CD hooked up> --env=<ksonnet environment> --indexFile=<index file>
--lookupFile=<lookup file> --workflowId=<workflow id invoking the container>"
}

# List of required parameters
names=(appDir env lookupFile indexFile base)
names=(appDir gitRepo env lookupFile indexFile workflowId)

source "${DIR}/parse_arguments.sh"

Expand All @@ -31,22 +28,19 @@ if [ -z ${dryrun} ]; then
dryrun=false
fi

cd ${appDir}

git config --global user.email pipeline@localhost
git clone -b ${branch} https://${GITHUB_TOKEN}@github.com/${gitRepo}.git repo && cd repo/${appDir}
git config credential.helper store
git checkout -b ${workflowId}
ks param set --env=${env} search-index-server indexFile ${indexFile}
ks param set --env=${env} search-index-server lookupFile ${lookupFile}
git add .

if (! ${dryrun}); then
git commit -m "Update the lookup and index file."
git push
else
echo "dryrun; not committing to git."
fi
git add . && git commit -m "Update the lookup and index file."

FILE=$(mktemp tmp.create_pull_request.XXXX)

cat <<EOF >$FILE
Update the lookup and index file.
Update the lookup and index file by pipeline ${workflowId}
This PR is automatically generated by update_index.sh.
Expand All @@ -56,5 +50,8 @@ EOF

# Create a pull request
if (! ${dryrun}); then
hub pull-request --base=${base} -F ${FILE}
git push origin ${workflowId}
hub pull-request --base=${gitRepo}:${branch} -F ${FILE}
else
echo "dry run; not committing to git."
fi
36 changes: 33 additions & 3 deletions code_search/pipeline/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,42 @@
To run the pipeline, follow the kubeflow pipeline instruction and compile index_update_pipeline.py and upload to pipeline
## Overview
This directory shows how to build a scheduled pipeline to periodically update the search index and update the search UI
using the new index. It also uses github to store the search UI's Kubernetes spec and hooks up Argo CD to automatically
update the search UI.

At a high level, the pipeline automate the process to
1. Compute the function embeddings
2. Create new search index file
3. Update the github manifest pointing to the new search index file

ArgoCD then triggers a new service deployment with the new manifest.

## Perquisite
- A cluster with kubeflow deployed, including [kubeflow pipeline](https://github.com/kubeflow/pipelines)
- A pre trained code search model.


## Instruction
1. Upload the ks-web-app/ dir to a github repository, and set up Argo CD following the
[instruction](https://github.com/argoproj/argo-cd/blob/master/docs/getting_started.md#6-create-an-application-from-a-git-repository-location)
Set up [Automated sync](https://github.com/argoproj/argo-cd/blob/master/docs/auto_sync.md) if you want the search UI to
be updated at real time. Otherwise Argo CD will pull latest config every 3 minutes as default.
2. Create a github token following [instruction](https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/#creating-a-token)
and store it in the cluster as secret. This allows pipeline to update github. The secret is stored in the kubeflow namespace, assuming it's the same namespace
as which the kubeflow is stored
```bash
kubectl create secret generic github-access-token --from-literal=token=[your_github_token] -n kubeflow
```
3. To run the pipeline, follow the kubeflow pipeline instruction and compile index_update_pipeline.py and upload to pipeline
page.

Provide the parameter, e.g.

```
PROJECT='code-search-demo'
CLUSTER_NAME='cs-demo-1103'
WORKING_DIR='gs://code-search-demo/pipeline'
SAVED_MODEL_DIR='gs://code-search-demo/models/20181107-dist-sync-gpu/export/1541712907/'
DATA_DIR='gs://code-search-demo/20181104/data'
```
```

TODO(IronPan): more details on how to run pipeline

0 comments on commit 3799bac

Please sign in to comment.