Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot get MLMD objects from Metadata store. #187

Closed
3 tasks done
xwt-ml opened this issue Jul 26, 2024 · 6 comments · May be fixed by #191
Closed
3 tasks done

Cannot get MLMD objects from Metadata store. #187

xwt-ml opened this issue Jul 26, 2024 · 6 comments · May be fixed by #191
Labels
kind/bug kind - things not working properly priority/needs-triage priority - needs to be triaged

Comments

@xwt-ml
Copy link

xwt-ml commented Jul 26, 2024

Checks

deployKF Version

0.1.5

Kubernetes Distribution

EKS

Kubernetes Version

1.29

Description

A simple "hello world" example cannot be started successfully in the kubeflow pipeline with kfp v2.
The following error message keeps showing
Cannot get MLMD objects from Metadata store.
This issue is related to kubeflow/pipelines#8733 and kubeflow/pipelines#8733.

Relevant Logs

No response

deployKF Values (Optional)

No response

@xwt-ml xwt-ml added kind/bug kind - things not working properly priority/needs-triage priority - needs to be triaged labels Jul 26, 2024
@haiminh2001
Copy link

Hi @xwt-ml, have you found any solution ? I saw on the kubeflow github that it should be fixed with kubeflow 1.9.0, but deployKF seem to not be able to upgrade kubeflow version.

@xwt-ml
Copy link
Author

xwt-ml commented Aug 2, 2024

I downgraded the kfp to use 1.8.22. Have not found a solution to work with kfp 2.0.

@thesuperzapper
Copy link
Member

@xwt-ml @haiminh2001 can you please try restarting the Deployment/metadata-envoy-deployment in the kubeflow namespace?

I see from others that there is some kind of race condition on that pod which causes this error.

Run the following command to do the restart:

kubectl rollout restart deployment/metadata-envoy-deployment --namespace kubeflow

@thesuperzapper
Copy link
Member

Assuming that restarting that Pod fixes it, I have made a PR to add a livenessProbe which will do it automatically:

@haiminh2001
Copy link

@xwt-ml @haiminh2001 can you please try restarting the Deployment/metadata-envoy-deployment in the kubeflow namespace?

I see from others that there is some kind of race condition on that pod which causes this error.

Run the following command to do the restart:

kubectl rollout restart deployment/metadata-envoy-deployment --namespace kubeflow

I also saw that issue yesterday from charmed kubeflow, I did restart the deployment and now it worked. But I am not 100% sure that restarting did the trick because I was applying other changes too. But adding probes can only benefit so I think you should definitely add the probes.

@xwt-ml
Copy link
Author

xwt-ml commented Aug 5, 2024

I restarted deployment/metadata-envoy-deployment and run a kfp2.0 pipeline in a new profile namespace. It works fine without the error.

@xwt-ml xwt-ml closed this as completed Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug kind - things not working properly priority/needs-triage priority - needs to be triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants