-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Web UI should supports get artifact from local path #1497
Comments
This is something we need as well, to be able to visualize the results correctly via the KFP UI on MiniKF. Can we coordinate to include this in 0.6? |
@jinchihe when you refer to the local path for the artifacts, what is the path local to? Is this a local file system of the machine running miniKube, or a local file system of the container creating the artifact or something else? |
The case is running on on-prem cluster, and the artifacts will be saved to path where PVC mounted to. |
How will the UI get the data? Will the UI pod mount the same PV to retrieve and render it? How does this work if UI need to render data for different pipeline that has different PVs? |
Also @mameshini has some extension to better abstract artifacts storage that works with On-Prem and cloud. We are looking into port them over as part of KF Pipeline as the default artifact storage. Please see this thread for details. |
@jlewi @IronPan indeed this is not trivial, and these are two distinct and orthogonal problems mentioned in this thread:
[1] needs to be tackled in a generic way that will work for different use cases, one of which is the Artifacts tab of the KFP UI. We have discussed this internally and I think we can contribute a generic mechanism that all UIs will be able to use for v0.7. This is related to the Tensorboard issue as well: So, yes, let's not consider this blocking for v0.6 because it needs significant work. We will aim to solve it universally for v0.7. For [2], I went through #596. I don't think it is related to this issue, it's more of an infra problem, of how one chooses to implement PVCs at the K8s level. If one has PVCs backed by Goofys which is then backed by an Object Store, then Pipelines and every other component will work transparently. @IronPan can you comment why this may be different than any other PVC provider? |
Thanks @cvenets downgrading to p1 and moving to 0.7.0 |
Anyone planning on tackling this in Q3? |
Hello, is there any update (or eta) on this issue? And is there a workaround to see local artifacts produced by a step? |
We could support this consistently:
|
Yes it would address the need - a single Kubernetes volume can be mounted to the Frontend pod, and used for all data storage and passing. Tensorboard data can also be saved on that volume. Currently only GCS buckets can be used to visualize Tensorboard data which is a serious limitation for other cloud providers, on-prem, and local. |
Question. Can multiple pods mount the same PV? Because for PV for AWS backed k8s uses EBS. And I don't think we can mount the same EBS on multiple pods. Does that means we need to use some custom volume backed by nfs or some storage service? |
Just a side-note, I got tensorboard viewer to work with S3 by exposing a env var in frontend to set a path to a custom podspec, which I config by mounting a configmap. |
@eterna2 How can we use your PR #1906 to use Tensorboard on S3 ? I opened an issue a while ago here kubeflow/kubeflow#3773. Thks ! |
Tensorboard supports s3 thru boto3 under the hood. You will need to either pass in the AWS credentials with env variables or set the pod annotations with the appropriate IAM roles (if your cluster is running kube2iam or equivalent) for the tensorboard pod to access ur s3 bucket. My PR exposes an env var The podTemplateSpec is used by the view controller to create the tensorboard viewer pod. You can create a configmap with a json for the podTemplateSpec for the tensorboard viewer with the AWS credentials or pod annotations. Then u mount the configmap to the path referenced by the env var. scheme for podTemplateSpec can be found in k8s API reference. U can ignore the image and arg field as these are injected by the viewer controller. https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.11/#podtemplatespec-v1-core See here on the env var to config the AWS credentials. https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html More on kube2iam here Note also that changes to the spec is not retro-active. U will need to kill existing view, and reload them to see the pods with the updated podTemplateSpec. |
@eterna2 I am also using pod annotations to authenticate with aws. I tried to patch the ml-pipeline-ui deploy and include the following patch:
But I am still getting
So the annotated pod is the one requesting it. Shouldn't this work? |
This is for tensorboard. Not pipeline-ui. Cuz minio-js does not support IAM roles. U need to wait for my pr #2081 to be merged in before pod annotations will work for pipeline-ui. Meanwhile u can use a minio gateway to proxy to s3. |
https://github.com/minio/minio/blob/master/docs/gateway/s3.md A minio gateway is setup very similarly to the kf minio server. U just need to add the pod annotations and change the args. |
Thanks ! It works pretty good. But the Tensorboard viewers pods will live forever... Do you think it's possible to add a spec to allow them to live only some minutes ? Or do we have to build a cronejob to delete automatically this pods every hour ? |
Not very familiar with the long thread here, is there anything left that still needs a solution? |
Yes it's still unsolved for AWS and on-prem, as far as I know. Presenting Tensorboard data to end users is a high value user story. Tensorboard data has to be presented not only after a pipeline step execution is done, but also during the pipeline step execution so that a data scientist can monitor the model training progress. Let's assume there is a single PVC mounted to all pipeline steps, as well to the pipeline UI. Many Kubeflow users are already mounting data to pods using tools like goofys. Pipeline UI has to access Tensorboard data via local path without assuming that tensor board data always in GCP storage bucket. Alternatively if pipeline UI can get artifact data from Minio/S3 buckets that would be fine too. I will be able to get back to testing this in a week. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it. |
I think this is what usually happen. |
The requested feature is already supported in https://github.com/kubeflow/pipelines/blob/master/docs/config/volume-support.md. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
We have support mounting PV/PVC in pipelines for on premise cluster. For this case the artifact will be stored in the PVC mount path, we should support getting artifacts from local path, such as
/mnt/a07b8215-8c1a-11e9-a2ff-525400ed33aa/tfx-taxi-cab-classification-pipeline-example-q7rtq-2449399348/data/roc.csv
. Thanks.The text was updated successfully, but these errors were encountered: