-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Web-app to manage tensorboard instances #3578
Comments
Issue-Label Bot is automatically applying the label Links: app homepage, dashboard and code for this bot. |
@jlewi yes I would like to contribute to this for 0.7. We could have a first implementation that would provide a very similar UX with the jupyter webapp and then iterate on that. |
/assign @kimwnasptd |
@kimwnasptd I'm punting this from 0.7 and downgrading to P2. I think graduating the Jupyter infrastructure to 1.0 is more important. |
@kimwnasptd any update on this? I think there was a mention of the fact that you had a prototype for a UI? Do you have an ETA for when you will have something ready to demo and then a potential PR? A TensorBoard controller was recently added |
tensorboard CRD seems to be very simple now. I am wondering if we should use deployment directly. Is there any benefit of introducing a new CRD here? |
@kimwnasptd are you interested in potentially being a mentor for any GSOC students interested in working on this? |
@kimwnasptd. I am interested to do my gsoc contribution for this project.Can you guide me for the next steps and environmental setup? |
/area gsoc |
@sarahmaddox: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jlewi Please could you check the label sync job? I added the |
@sarahmaddox the label is there. |
Thanks @kimwnasptd Here's my conjecture; it looks like the path you are on will lead you to eventually putting a PodTemplateSpec in the TensorBoardController (or the equivalent) Per #5039 you want to add nodeAffinity and presumably volumes as well. To support object storage you need to support setting secrets and service accounts
To accomodate different versions of tensorboard we will likely need to allow the docker image to specified as well. At which point if we aren't using a podTemplateSpec the question arises what podTemplateSpec fields aren't being exposed? So it seems like what we really need is an easily extensible story for managing stateful web applications; e.g. jupyter and tensorboard. |
I really agree with your thought process and points and I also think that having an extensible story for deploying our stateful apps, like Jupyter, Tensorboard, Theia etc is a step towards the right direction. My only concern as of right now is that I don't want the GSoC project to get off schedule or blocked from this transition to a more abstract/reusable way of deploying our apps. The ideal scenario for me would be to:
@jlewi do you find the above plan reasonable? |
LGTM |
@jlewi @sarahmaddox, @kandrio98 @elikatsis and I are really excited to inform you that we have a first iteration of a web app for managing Tensorboard instances! @kandrio98 did a lot of contributions both for in the Tensorboards Controller, #5069 #5218 #5262 #5266, as well as the actual web app, #5259 #5180 #5267. He has an e2e view of how to deploy a Tensorboard instance all the way from the user's perspective using the UI up to the k8s controller that is handling the CRs. You can take a quick look of the app from the frontend's PR #5267. With this we can start discussing together what the next steps can be. Some things that come to mind:
All in all, I believe @kandrio98 learned a lot through this project and it was a pleasure mentoring him for both @elikatsis and I. |
Well done @kandrio98, @kimwnasptd, and @elikatsis! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/lifecycle frozen |
This would also address an old issue which has since been closed regarding "how to spin up a web application on a Kubeflow cluster": kubeflow/website#2044 |
@kimwnasptd @jlewi Thank you for your contribution to TWA. I found an example to load logs saved in gs.
I tried to replace the logspath with my MinIO path, but it did not work. |
@wyljpn That's because tensorboard uses tf io to connect to filesytems other than local and GCS and it requires some env vars to be set on the tensorboard container. You need to set the following:
|
Hi, @ConverJens so glad to hear from you again.
So I think if we can pass the env vars to a Tensorboard pod, it could load logs from MinIO. But how to pass the env vars conveniently?
I read source code of the Tensorboard-controller. It seems that it supports only gcp, mount a secret for gcp in hard code. There is no code for supporting s3 in the Tensorboard-controller.
|
I added EnvFrom in Containers to make it supports S3 Compatible Object successfully.
|
Can this change be merged? |
cross posting here, let's track this feature in #6493 |
We should consider creating a standalone web app similar to our jupyter web app that makes it easy for folks to create/delete tensorboard instances.
I suspect copying and modifying the jupyter web app to create a web app for managing tensorboard would be pretty straightforward.
Some pointers to get started
This project could be broken down into several pieces
Pipelines (@neuromage ) create a Viewer CRD which can be used for tensboard.
https://github.com/kubeflow/pipelines/tree/master/backend/src/crd/controller/viewer
Currently I believe that is integrated into pipelines and used to auto-visualize tensorboard data reported by pipelines.
@kimwnasptd Thoughts? Any interest in possibly tackling this as part of 0.7?
/cc @karthikv2k
/cc @neuromage
/cc @vkoukis
The text was updated successfully, but these errors were encountered: