Skip to content

Commit

Permalink
[mnist] Add support for S3 in TensorBoard component; Update docs. (ku…
Browse files Browse the repository at this point in the history
…beflow#499)

* [mnist] Add support for S3 in TensorBoard component; Update docs.

* [mnist] reverted autonumbering in README

* [mnist] add expected fail for predict_test, until it'ss fixed
  • Loading branch information
Oleg Shepetjuk authored and jlewi committed Feb 20, 2019
1 parent 45d157f commit 90ea8cb
Show file tree
Hide file tree
Showing 5 changed files with 179 additions and 29 deletions.
159 changes: 150 additions & 9 deletions mnist/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -330,9 +330,18 @@ kubectl logs -f mnist-train-dist-chief-0
#### Using S3
To use S3 we need we need to configure TensorFlow to use S3 credentials and variables. These credentials will be provided as kubernetes secrets, and the variables will be passed in as environment variables. Modify the below values to suit your environment.
To use S3 we need we need to configure TensorFlow to use S3 credentials and variables. These credentials will be provided as kubernetes secrets and the variables will be passed in as environment variables. Modify the below values to suit your environment.
Give the job a different name (to distinguish it from your job which didn't use GCS)
Lets start by creating an environment to store parameters particular to writing the model to S3
and running distributed.
```
KSENV=distributed
cd ks_app
ks env add ${KSENV}
```
Give the job a different name (to distinguish it from your job which didn't use S3)
```
ks param set --env=${KSENV} train name mnist-train-dist
Expand Down Expand Up @@ -432,7 +441,8 @@ various environment variables configuring access to S3.
ks param set --env=${KSENV} train envVariables ${AWSENV}
```
* If we look at the spec for our job we can see that the environment variable `AWS_BUCKET` is set.
* If we look at the spec for our job we can see that the environment variables related
to S3 are set.
```
ks show ${KSENV} -c train
Expand All @@ -452,7 +462,9 @@ various environment variables configuring access to S3.
..
env:
...
- name: AWS_BUCKET
- name: AWS_REGION
value: us-west-2
- name: BUCKET_NAME
value: somebucket
...
...
Expand Down Expand Up @@ -484,13 +496,12 @@ There are various ways to monitor workflow/training job. In addition to using `k
### Tensorboard
TODO: This section needs to be updated
#### Using GCS
Configure TensorBoard to point to your model location
```
ks param set tensorboard --env=${KSENV} logDir ${LOGDIR}

```
Assuming you followed the directions above if you used GCS you can use the following value
Expand All @@ -499,18 +510,148 @@ Assuming you followed the directions above if you used GCS you can use the follo
LOGDIR=gs://${BUCKET}/${MODEL_PATH}
```
Then you can deploy tensorboard
You need to point TensorBoard to GCP credentials to access GCS bucket with model.
1. Mount the secret into the pod
```
ks param set --env=${KSENV} tensorboatd secret user-gcp-sa=/var/secrets
```
* Setting this ksonnet parameter causes a volumeMount and volume to be added to TensorBoard
deployment
* To see this you can run
```
ks show ${KSENV} -c tensorboard
```
* The output should now include a volumeMount and volume section
1. Next we need to set the environment variable `GOOGLE_APPLICATION_CREDENTIALS` so that our code knows
where to look for the service account key.
```
ks param set --env=${KSENV} tensorboard envVariables GOOGLE_APPLICATION_CREDENTIALS=/var/secrets/user-gcp-sa.json
```
* If we look at the spec for TensorBoard deployment we can see that the environment variable `GOOGLE_APPLICATION_CREDENTIALS` is set.
```
ks show ${KSENV} -c tensorboard
```
```
...
env:
...
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /var/secrets/user-gcp-sa.json
```
#### Using S3
Configure TensorBoard to point to your model location
```
ks param set tensorboard --env=${KSENV} logDir ${LOGDIR}
```
Assuming you followed the directions above if you used S3 you can use the following value
```
LOGDIR=s3://${BUCKET}/${MODEL_PATH}
```
You need to point TensorBoard to AWS credentials to access S3 bucket with model.
1. Pass secrets as environment variables into pod
```
ks param set --env=${KSENV} tensorboard secretKeyRefs AWS_ACCESS_KEY_ID=aws-creds.awsAccessKeyID,AWS_SECRET_ACCESS_KEY=aws-creds.awsSecretAccessKey
```
* Setting this ksonnet parameter causes a two new environment variables to be added to TensorBoard
deployment
* To see this you can run
```
ks show ${KSENV} -c tensorboard
```
* The output should now include two environment variables referencing K8s secret
```
...
spec:
containers:
- command:
...
env:
...
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: awsAccessKeyID
name: aws-creds
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: awsSecretAccessKey
name: aws-creds
...
```
1. Next we need to set a whole bunch of S3 related environment variables so that TensorBoard
knows how to talk to S3
```
AWSENV="S3_ENDPOINT=${S3_ENDPOINT}"
AWSENV="${AWSENV},AWS_ENDPOINT_URL=${AWS_ENDPOINT_URL}"
AWSENV="${AWSENV},AWS_REGION=${AWS_REGION}"
AWSENV="${AWSENV},BUCKET_NAME=${BUCKET_NAME}"
AWSENV="${AWSENV},S3_USE_HTTPS=${S3_USE_HTTPS}"
AWSENV="${AWSENV},S3_VERIFY_SSL=${S3_VERIFY_SSL}"
ks param set --env=${KSENV} tensorboard envVariables ${AWSENV}
```
* If we look at the spec for TensorBoard deployment we can see that the environment variables related to S3 are set.
```
ks show ${KSENV} -c tensorboard
```
```
...
spec:
containers:
- command:
..
env:
...
- name: AWS_REGION
value: us-west-2
- name: BUCKET_NAME
value: somebucket
...
```
#### Deploying TensorBoard
Now you can deploy TensorBoard
```
ks apply ${KSENV} -c tensorboard
```
To access tensorboard using port-forwarding
To access TensorBoard using port-forwarding
```
kubectl -n jlewi port-forward service/tensorboard-tb 8090:80
```
Tensorboard can now be accessed at [http://127.0.0.1:8090](http://127.0.0.1:8090).
TensorBoard can now be accessed at [http://127.0.0.1:8090](http://127.0.0.1:8090).
## Serving the model
Expand Down
3 changes: 3 additions & 0 deletions mnist/ks_app/components/params.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,12 @@
trafficRule: 'v1:100',
},
"tensorboard": {
envVariables: 'GOOGLE_APPLICATION_CREDENTIALS=/var/secrets/user-gcp-sa.json',
image: "tensorflow/tensorflow:1.11.0",
logDir: "gs://example/to/model/logdir",
name: "tensorboard",
secret: '',
secretKeyRefs: '',
},
"web-ui": {
containerPort: 5000,
Expand Down
44 changes: 25 additions & 19 deletions mnist/ks_app/components/tensorboard.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
local env = std.extVar("__ksonnet/environments");
local params = std.extVar("__ksonnet/params").components.tensorboard;

local util = import "util.libsonnet";

local k = import "k.libsonnet";

local name = params.name;
Expand Down Expand Up @@ -49,6 +51,12 @@ local service = {
},
};

local tbSecrets = util.parseSecrets(params.secretKeyRefs);

local secretPieces = std.split(params.secret, "=");
local secretName = if std.length(secretPieces) > 0 then secretPieces[0] else "";
local secretMountPath = if std.length(secretPieces) > 1 then secretPieces[1] else "";

local deployment = {
apiVersion: "apps/v1beta1",
kind: "Deployment",
Expand Down Expand Up @@ -82,29 +90,27 @@ local deployment = {
containerPort: 80,
},
],
env: [
env: util.parseEnv(params.envVariables) + tbSecrets,
volumeMounts: if secretMountPath != "" then
[
{
name: "GOOGLE_APPLICATION_CREDENTIALS",
value: "/secret/gcp-credentials/user-gcp-sa.json",
name: secretName,
mountPath: secretMountPath,
readOnly: true,
},
],
volumeMounts: [
{
mountPath: "/secret/gcp-credentials",
name: "gcp-credentials",
},
],
},
],

volumes: [
{
name: "gcp-credentials",
secret: {
secretName: "user-gcp-sa",
},
] else [],
},
],
volumes:
if secretName != "" then
[
{
name: secretName,
secret: {
secretName: secretName,
},
},
] else [],
},
},
},
Expand Down
1 change: 0 additions & 1 deletion mnist/ks_app/components/train.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ local trainEnv = [
},
];

// AWS Access/Secret keys
local trainSecrets = util.parseSecrets(params.secretKeyRefs);

local secretPieces = std.split(params.secret, "=");
Expand Down
1 change: 1 addition & 0 deletions mnist/testing/predict_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ def send_request(*args, **kwargs):

return r

@pytest.mark.xfail
def test_predict(master, namespace, service):
app_credentials = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
if app_credentials:
Expand Down

0 comments on commit 90ea8cb

Please sign in to comment.