Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final readme fixes #16

Merged
merged 4 commits into from
Dec 7, 2017
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Adding a sample configuration
  • Loading branch information
foxish committed Dec 7, 2017
commit 8d0d71a9323c60791ea49e2189dc8183568f83fd
33 changes: 31 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,9 +93,38 @@ unsecured endpoint. For a production deployment, refer to the [documentation](ju
### Training

The TFJob controller takes a YAML specification for a master, parameter servers, and workers to help run [distributed tensorflow](https://www.tensorflow.org/deploy/distributed). The quick start deploys a TFJob controller and installs a new `tensorflow.org/v1alpha1` API type.
You can create new Tensorflow Training deployments by submitting a specification to the aforementioned API. For examples, look under the [tf-controller-examples/](https://github.com/google/kubeflow/tree/master/tf-controller-examples) directory.
You can create new Tensorflow Training deployments by submitting a specification to the aforementioned API.

Refer to the README in the [tensorflow/k8s](https://github.com/tensorflow/k8s) repository for more information on using the TfJob controller to run TensorFlow jobs on K8s.
An example specification looks like the following:

```
apiVersion: "tensorflow.org/v1alpha1"
kind: "TfJob"
metadata:
name: "example-job"
spec:
replicaSpecs:
- replicas: 1
tfReplicaType: MASTER
template:
spec:
containers:
- image: gcr.io/tf-on-k8s-dogfood/tf_sample:dc944ff
name: tensorflow
restartPolicy: OnFailure
- replicas: 1
tfReplicaType: WORKER
template:
spec:
containers:
- image: gcr.io/tf-on-k8s-dogfood/tf_sample:dc944ff
name: tensorflow
restartPolicy: OnFailure
- replicas: 2
tfReplicaType: PS
```

For runnable examples, look under the [tf-controller-examples/](https://github.com/google/kubeflow/tree/master/tf-controller-examples) directory. Detailed documentation can be found in the [tensorflow/k8s](https://github.com/tensorflow/k8s) repository for more information on using the TfJob controller to run TensorFlow jobs on Kubernetes.

### Serve Model

Expand Down