Skip to content

kubeflow/training-operator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kubeflow Training Operator

Build Status Coverage Status Go Report Card

Overview

Before v1.2 release, tensorflow-operator can only support TFJob on Kubernetes. Starting from v1.3, Training Operator provides Kubernetes custom resources that makes it easy to run distributed or non-distributed TensorFlow/PyTorch/MXNet/XGBoost jobs on Kubernetes.

Prerequisites

  • Version >= 1.16 of Kubernetes
  • Version >= 3.x of Kustomize
  • Version >= 1.21.x of Kubectl

Installation

Master Branch

kubectl apply -k "github.com/kubeflow/tf-operator.git/manifests/overlays/standalone?ref=master"

Specific Release

kubectl apply -k "github.com/kubeflow/tf-operator.git/manifests/overlays/standalone?ref=v1.3.0"

Tensorflow Release Only

For users who prefer to use original tensorflow controllers, please checkout v1.2-branch, we will maintain the bug fix in this branch.

kubectl apply -k "github.com/kubeflow/tf-operator.git/manifests/overlays/standalone?ref=v1.2.0"

Quick Start

Please refer to the quick-start-v1.md and Kubeflow Training User Guide for more information.

API Documentation

Please refer to API Documentation.

Community

You can:

This is a part of Kubeflow, so please see readme in kubeflow/kubeflow to get in touch with the community.

Contributing

Please refer to the DEVELOPMENT

Change Log

Please refer to CHANGELOG

Version Matrix

The following table lists the most recent few versions of the operator.

Operator Version API Version Kubernetes Version
v1.0.x v1 1.16+
v1.1.x v1 1.16+
v1.2.x v1 1.16+
v1.3.x v1 1.18+
latest (master HEAD) v1 1.18+