-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Do we need JupyterHub? Should we just use a web app + K8s API Server? #1630
Comments
JupyterHub is built to run on a lot of places, not just Kubernetes. This is the primary reason it's built the way it is. It's heavily used in simple single-node installs, docker based installs, HPC systems, etc. We have an outreachy project to make the proxy HA: https://github.com/jupyterhub/outreachy/blob/master/ideas/traefik-jupyterhub-proxy.rst. There's also an Ingress based proxy implementation already. mybinder.org also runs a JupyterHub that sees over 60k sessions per week with no scalability problems, so we're pretty confident we're ok for just now, and there should be a HA solution in the next 6 months or so. In general, we have found that every re-implementation of 'let us spawn a bunch of notebooks' has had to fix bits and pieces of the various bugs the JupyterHub community has fixed in the last few years. Many are in the process of moving towards replacing their custom built solution with JupyterHub, contributing fixes & features upstream as they go. re: Istio & others, we'd love to have ways to plug into that without tying ourselves to require Istio or any specific implementation. I'm trying to find some time to work with the PANGEO Community to explore things like this. We'd love to have more engagement from the Kubeflow community in the JupyterHub community than we have right now. This will be mutually beneficial in the long run, but only if we both make explicit efforts around it. I don't really think that's happening right now in any significant form. |
That was my conjecture. It makes sense to me that JupyterHub provides a lot of functionality in order to support non-K8s platforms. I think my questions is whether just running JupyterHub is really the best long term solution for a good K8s native solution or whether a more microservice, K8s native architecture might be better?
Can you provide more examples? What sorts of problems would we run into managing the notebooks directly? On K8s I'd expect K8s is doing all the heavy lifting of managing notebooks. So its unclear to me what role hub is really playing that is likely to break if we reimplement it ourselves?
This is a good point and I would too. Along those lines let me tell what I'm thinking and maybe you can point out where we'd benefit by sticking with JupyterHub
|
I'm +1 on removing Jupyterhub (I mentioned it several months ago). Number 1 issue for me is that user management in JH is not the same as in K8s. That means RBAC won't work for JH, that means we can't securely make notebooks talk to kubernetes underneath and spawn new pods (for example tfjobs). We could rip out user mgmt from JH, but what's left then? JH is stateful and that's an issue, what happens when pod dies? Do you manage state externally? That's issue for operators. I'd be much more inclined towards something like jupyter CRD. You will spawn deployment of JH, figure out storage backends (PVC? Object?), setup clustering for distributed training and data science (Dask cluster configuration for example) etc etc. We can also create REST API or frontend for that, but I'd rather create k8s crd and maybe frontend for all of kubeflow (spawn me jupyter, tfjob, katib study etc on same UI). |
We (@ioandr, @iliastsi, myself) agree completely that it would be best to replace JupyterHub in the long run. I have tried to lay out a plan to do that in #34 (comment). Using JupyterHub as essentially a wrapper around KubeSpawner has a number of problems, as @inc0 also mentions in the comment above, including:
However, we don’t have to do this now. We propose we continue using JupyterHub to iterate on the user-facing UI and improve the user experience focusing on changes to KubeSpawner, document exactly what our limitations with using JupyterHub are, and then switch to a K8s-specific web app to target them explicitly. |
Sounds like a great plan to me. |
@jlewi @vkoukis @inc0 @ashahba +1 on metacontroller. It allows rapid prototyping and could leverage a CRD. A CompositeController for example would intercept the CRD sent from KubeSpawner (or its replacement) and create a deployment whose podtemplate specified a ServiceAccountName and PodSecurityContext appropriate for the user. |
I've stumbled across this as a JupyterHub admin running a custom Windows HPC / docker setup but interested in moving to kubernetes. As such I really don't know what I'm talking about when it comes to K8s but some of the assertions above don't seem entirely correct to me. I'll pull out authorization in particular:
IIUC the JupyterHub authentication system is pluggable - what's preventing anyone writing a K8sAuthenticator that integrates with the rest of the K8s ecosystem? For example, I'm using the As for specifying resources:
I don't see how you can't simply have a custom options form. In my case, all users have a default home drive mapped but other network folders are mapped into the container based on their AD group membership. The user doesn't have an option to not have a mapped folder available, but why would they care if they had more access than they needed. I just don't see that any of the listed problems aren't easily fixable with very minimal code rather than replacing the entire application. I think both communities would benefit from working together to fix the problems rather than going their own separate ways. Anyway, just my 2c - there may well be some K8s specific issues I'm unaware of which would make solving the stated problems more difficult than simply starting from scratch - I'm not knowledgeable enough about K8s to say but I'd be very interested to hear @yuvipanda's opinion on the matter. |
Since it was asked, a bit of history on why JupyterHub has its current design and scaling characteristics: The target use case was a single machine with 5-50 users, and several design decisions were taken with user-space installability, maintainability, and simplicity in mind, while scalability was explicitly out of scope as something we knew we didn't have the resources to tackle. Since then, our user community has developed in a different direction than initially expected, and we have worked on scaling, but running more than 5k concurrent active users still isn't supported without deploying multiple independent Hubs (as @yuvipanda has done). If we had built JupyterHub to be more kube/cloud/scalable-native, it would look quite different. On the proxy as single-point-of-failure and scaling bottleneck: The default proxy implementation does now support external storage for its routing table, and there is an implementation for redis, which means the proxy should be able to scale reasonably well. I have not tested this in production, though. You do seem to need several thousand users before proxy performance becomes an issue, so exposing this hasn't been a high priority, yet. A better fit, especially in the k8s community, is probably the current plan to make an etcd-backed traefik implementation, as @yuvipanda mentioned. With that said, JupyterHub is not the way to deploy notebooks on behalf of users. It is one way, and meant to simplify deploying notebooks with one particular pattern. If your pattern/integrations are strongly divergent from the design of JupyterHub, it may well be more work to coerce JupyterHub into behaving how you like than to implement your own solution, tailored to your needs. Ultimately, all JupyterHub+KubeSpawner does is launch pods and provide routing/authentication. As with any shared infrastructure, JupyterHub and KubeSpawner have accumulated loads of fixes and helpers for corner cases, etc. that people have faced over the years. I'd say this is the main benefit to using JupyterHub for a case like yours, and the main thing you lose when rolling your own. For Kubernetes experts like you folks, deploying notebooks in pods in a kube-native application, it is very likely simpler to forego JupyterHub altogether. The target audience for JupyterHub is pretty much the complement of kubeflow developers, attempting to simplify deploying notebooks on behalf of users for folks who don't understand Kubernetes, rather than the other way around. Of course, we're very happy to have kubeflow use JupyterHub, but I would never argue that it's always the right choice, and for deployment experts with a given technology, there's a very good chance it's not. Getting feedback from you folks on exactly why/how JupyterHub isn't a working well is super useful for us in guiding future development, whether you stick with it or not. Even if/while you stick with JupyterHub, some ideas on reducing friction and accomplishing the listed goals piecemeal:
If you take these two approaches (together or separately), it may reduce friction with JupyterHub in the short term, and should you choose to move away from JupyterHub in the future, it should make that transition simpler, since fewer components are really relying on JupyterHub. |
Thanks @minrk for the detailed info. I also see you just added multi user jupyterhub/jupyterhub#2154; very nice. I also see @yuvipanda has some in progress work to support per user namespaces. jupyterhub/kubespawner#76 |
Here's some more context for the JupyterHub folks. Here's a diagram of Kubeflow's current architecture Ambassador is a programmable reverse proxy.
Ambassador supports external auth.
For comparison here's the JupyterHub diagram that I pulled from the JupyterHub docs So in our case Ambassador replaces the reverse proxy in JupyterHub and external auth support in Ambassador replaces JupyterHub's authenticator plugins. As the diagram shows we have many web apps that might require authentication so we'd really like to do authentication outside JupyterHub e.g. via Ambassador external auth so that we don't have to reimplement it for each app. This is what we do right now on GCP where we use IAP to attach JWT's to requests and we just configure JupyterHub to do JWT checking. If we move to ISTIO to doing JWT checking then we can manage this centrally for all the web apps which we will most likely need to do anyway. So the two pieces we care most about in JupyterHub
We could use JupyterHub as a REST API (and that is pretty much the plan of record I think from @vkoukis). As a long term solution though I think a K8s custom resource might give us a simpler CRUD server with a more K8s native API. In particular, (following the pattern of TFJob/PyTorchJob) we could surface the K8s object e.g. the PodTemplateSpec directly in the spec. This would eliminate a lot of the spawner_options that are just a layer of indirection around K8s fields like labels, annotations, and sidecars. We can also take advantage of Admission controllers to dynamically inject common configuration (e.g. a PVC that should be attached to all pods). I suspect we don't need the user database because we can just use K8s metadata to track which notebooks belong to which users. So long term I think we'd want to move away from the JupyterHub spawner API to a more K8s native API. When that happens how easy would it be to share a front end implementation? It would be great to collaborate with the Jupyter/JupyterHub communities on a K8s CRD for managing jupyter notebooks on K8s. But would that be of interest to the Jupyter/JupyterHub communities? Would you want to pull in a dependency like metacontroller? |
Thanks for the context @minrk and agree with @jlewi on the long term plan for a k8s based CRD. At my company, we rolled our own k8s API based notebook launch system instead of JupyterHub for a lot of the same reasons:
I think the JupyterHub project is moving in a direction where all these pieces are pluggable but it requires learning the JupyterHub API instead of using native k8s based APIs. We could've definitely made JupyterHub work but ultimately decided it was easier to write our own API on top of raw k8s since it gave us maximum flexibility and we already have the k8s domain knowledge. For kube-flow in general, I would biased towards native k8s API whenever possible since that is what the user-base is comfortable in. It requires a non-trivial amount of effort to set up and configure a JupyterHub server with all the right options if you don't want the default auth and networking stack. I think switching kube-spawner to a k8s CRD for notebooks is a great middle ground since it would pick up upstream fixes from the JupyterHub community while avoiding the need to run a heavy server/learn new technology for many k8s native users. We might want a lightweight UI/REST endpoint around the CRD since talking to k8s API from a browser is pretty painful but I think that can be solved independently. |
@jlzhao27 Thanks! Can you share more about how you are controlling notebooks? In particular which K8s controllers you are using? Did you have to implement any control logic beyond what the built in controllers provide? My initial thought is that all we need is a statefulset and service. So a CRD would mostly be a small convenience. Longer term it would allow us to enable features like culling idle pods without exposing that to clients. |
We actually were able to get away with only using Statefulsets were not ideal because they require a PVC whereas we sometimes wanted to manage volumes externally and attach the same PV to multiple notebook containers (we are using an NFS based filesystem). We started initially with |
sounds great, thanks @jlewi |
I've drafted an initial design doc: Feel free to comment on the doc. Closing this issue now that we have the design doc. |
@jlewi we have a working setup of jupyterhub with istio and external authentication service that is integrated with our OIDC provider. This setup allows separation of multiple tenants/teams and users within each team. I will post a diagram and if you find it interesting, I can try to describe it in more details |
@mlushpenko looks valuable but at this time Kubeflow has fully migrated off JupyterHub and I don't think we are going back. We want OIDC support and multi-tenancy for multiple applications not just jupyter so it doesn't make sense to go with a JupyterHub centric approach. |
@jlewi we did a test with pure notebooks as well, I can draw another diagram for that, it's close to this but then path-based routing is used for user separation and you need some component to pre-create notebooks with specific naming/labels (kubeflow in your case). We actually want to use our setup for more than notebooks as well. And we wanted to use kubeflow or parts of it first, but it had too much stuff hardcoded so wasn't flexible enough for us if we wanted to start with only some kubeflow components and integrated it with istio in our own way (kubeflow had hardcoded istio gateway and some other things) |
You are right, you have a solution already, thought maybe some ideas could be useful. |
At the contributor summit yesterday, one question that came up is whether we should replace JupyterHub with a bunch of separate microservices e.g.
There's a couple potential reasons for doing this
Long term we might be able to build a richer UI that would be better for users
* Lyft showed a picture of their UI (hopefully they'll share slides)
I don't think we really want to use JupyterHub for authentication
* Kubeflow consists of many web apps (TensorBoard, TFJobs UI, CentralDashboard, model servers)
* I think an architecture (see Secure proxy #11) where we put authentication in front of all the services makes sense
* Ideally we would use ISTIO to restrict access to individual services (e.g. ensure only user X can send requests to user's X's notebook).
Scalability
* We think the JupyterHub reverse proxy might be a blocker to taking advantage of K8s to scale out (see Use Ambassador/Envoy as proxy for JupyterHub #239)
It would be valuable to understand why the folks working on JupyterHub/kube-spawner went that route as opposed to a more micro-service/k8s native architecture.
@yuvipanda @foxish Can you provide any background?
/cc @pdmack @kkasravi @ioandr @inc0
Related Issues
#34 JupyterHub UI element for spawning notebooks
The text was updated successfully, but these errors were encountered: