Skip to content
This repository has been archived by the owner on May 11, 2024. It is now read-only.
This repository has been archived by the owner on May 11, 2024. It is now read-only.

Pachyderm File System Source Type for KVC #22

Open
@dwhitena

Description

As discussed in our recent meeting, kubeflow/kubeflow#151 (comment) requires a way to expose data from Pachyderm to a TFJob. Moreover, this type of data access pattern would be useful for integrating any distributed training framework (e.g., SparkML) or other resource into a Pachyderm pipeline.

In our discussion, we proposed creating a source type for exposing data from the versioned Pachyderm file system (which is backed by an object store). I suggest this format:

apiVersion: kvc.kubeflow.org/v1
kind: VolumeManager
metadata:
  name: kvc-example1
  namespace: <insert-namespace-here>
spec:
  volumeConfigs:
    - id: "vol1"
      replicas: 1
      sourceType: "PFS"
      sourceRepo: <insert input repo name or names here>
      sourceBranch: <insert input repo branch here, e.g., "master">
      accessMode: "ReadWriteOnce"
      capacity: 5Gi
      labels:
        key1: val1
        key2: val2
      options:
        pachSecretName: <insert-secret-name-for-pach-auth-and-host>

This would allow the connector to utilize the Pachyderm client to pull the necessary data into the volume.

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions