This repository has been archived by the owner on May 11, 2024. It is now read-only.
This repository has been archived by the owner on May 11, 2024. It is now read-only.
Open
Description
As discussed in our recent meeting, kubeflow/kubeflow#151 (comment) requires a way to expose data from Pachyderm to a TFJob. Moreover, this type of data access pattern would be useful for integrating any distributed training framework (e.g., SparkML) or other resource into a Pachyderm pipeline.
In our discussion, we proposed creating a source type for exposing data from the versioned Pachyderm file system (which is backed by an object store). I suggest this format:
apiVersion: kvc.kubeflow.org/v1
kind: VolumeManager
metadata:
name: kvc-example1
namespace: <insert-namespace-here>
spec:
volumeConfigs:
- id: "vol1"
replicas: 1
sourceType: "PFS"
sourceRepo: <insert input repo name or names here>
sourceBranch: <insert input repo branch here, e.g., "master">
accessMode: "ReadWriteOnce"
capacity: 5Gi
labels:
key1: val1
key2: val2
options:
pachSecretName: <insert-secret-name-for-pach-auth-and-host>
This would allow the connector to utilize the Pachyderm client to pull the necessary data into the volume.