-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add liveness and readness probe for jupyter notebook #5292
Conversation
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
@googlebot I fixed it. |
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
@MartinForReal Is this disable-able/configurable for non-Jupyter docker images? Or even just JupyterHub ones? |
Signed-off-by: MartinForReal <fanshangxiang@gmail.com>
/assign @StefanoFioravanzo |
@thesuperzapper "/notebook/" + instance.Namespace + "/" + instance.Name + "/api/status" is the default URI generated by notebook controller. No matter what the underlying image is, the container should launch a Jupyter notebook instance. /api/status is a standard API provided by Jupiter notebook. So I think it should work well with all kinds of Jupyter instance. The notebook culling feature in the notebook controller is a good example. I can add one extra env flag for this, just like what ENABLE_CULLING did. How do you think? |
@MartinForReal, I am talking about when the image launched by the notebook controller is NOT a Jupyter image, for example: RStudio or Theia. It would be good if we can at least prepare for that, as this change would cause those images to perpetually restart. |
Why don't we create another controller for RStudio or Theia? @thesuperzapper |
/test kubeflow-presubmit |
@MartinForReal, I am fairly sure we can just generalise the existing notebook-controller to support any docker image which exposes a web socket. Currently you have to hack it a bit for non-jupyter, but it does work as long as you:
We can make this way easier if we:
|
Well, it is interesting but I'm afraid I have different ideas. I think there are a lot of solutions available which help us create three objects ( one statefulset, one service, and one virtualservice) such as helm, oam and etc. And they are generic enough so that we can apply them without any modification to our apps. To be more specific: what are the pros and cons when this solution is compared with helm operator + customized helm charts? By implementing the controller, we can customize the resource lifecycle to support Jupiter operations, especially day2 operations. |
@MartinForReal when I say "support any docker image which exposes a web socket" I am actually meaning any "web-based IDE" I think it would be crazy for us not to consolidate this into a single controller, as the code duplication would be crazy. For example, I believe these are the IDE's we need to support: |
But I don't think we can find a general solution that provides monitoring, error handling, authentication without specific customization. And we can extract common libraries from these implementations. I don't think code duplication issue does exist. |
@MartinForReal why not put a
The setting of Why do you need a new controller, and therefore CRD for each notebook type? |
I'm also trying to fix this problem, see #5282 Instead of adding the readiness probe to the notebook container via the notebook controller, we were thinking to do this using a new custom admission controller. The idea is that at the creation of a notebook pod, the admission control would jump in and mutate the pod as desired. The admission controller is configured via a MutatingWebhookConfiguration to act on the creation of Pods. Then, at create, the admission controller filters the right pod and only allows notebook pods to pass. The next step is to find the right container, as there are two (isto-proxy and the container running the nb image): this is done by looking at the environment variables. The nb container has a var named "NB_PREFIX", The final step is to add the readiness probe to the container. |
@thesuperzapper I don't see any improvement when it is compared with the new controller. because it makes validation logic and control logic much more complicated and we also need to add extra logic to query different kinds of instance. Cluster-API is a good example of this kind of design. But you can find that It is difficult to validate cr and implement providers. And there is no point in making it general because we can not share any code except object creation.
|
@MartinForReal but if you think about it from a user perspective, having a new resource type for each new type of IDE is cumbersome, as Having multiple Resource types also causes more pain in the frontend UI design, and just pushes all the complexity into the JavaScript/Python app. Under your design (multiple controllers) how would you propopose supporting a "generic" IDE type which could be used by Kubeflow admins for whatever internal tool they might have? Finally, I believe there is some code which is shared, especially complicated logic like:
|
@thesuperzapper well, I can hardly agree with you because daemonset, statefulset, and deployment all have a field which is the type of corev1.podSpec But it doesn't mean they should be managed by the same controller. and I think it is a common practise to apply Backend for FrontEnd pattern here. And we have graphql and other tools and technology which can simplify works in front end. And I highly doubled that requirement is realistic because the user does care what kind of ide they are using. And kubeflow admin always manage objects using labeling strategies. |
@MartinForReal regardless of how we manage the Notebook, please make this PR not break non-jupyter docker images. |
@MartinForReal, after thinking about it more, you are partially right but I think I have an idea which will work for both of us. We can create an annotation for the Notebook resource called
Note there is already precedent for annotations like this, see kubeflow-resource-disabled. If you do this, I can implement a user friendly way for Kubeflow administrators to provide non-jupyter images in the |
@kimwnasptd you might want to chip in |
First off, supporting more types of Notebooks is something we want to put effort in for Kubeflow 1.2 and forward. FYI we are moving towards abstracting the way we deploy our apps in Kubeflow. In the near future we'll be discussing on what technical details to abstract in order to deploy a stateful application on Kubeflow, such as a Notebook, an IDE, Tensorboard etc, via a more generic CutomResource. But until then, for this PR I also aggre with @thesuperzapper to not introduce extra functionality that only works for Jupyter. What I propose is:
This way we move forward with @MartinForReal's implementation, but we also allow for other people to add support for other types of Notebooks in the future. @MartinForReal could you update the code to align with the proposed plan? |
Hi all, this is a very interesting issue!
|
@@ -323,6 +322,36 @@ func generateStatefulSet(instance *v1beta1.Notebook) *appsv1.StatefulSet { | |||
}, | |||
} | |||
} | |||
if len(container.Ports) == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you assume there is only one container running in the pod, while in fact there are two; istio-proxy and notebook container. As result, the liveness- and readiness probe will only be added to the istio-proxy container and the notebook container will stay untouched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
istio-proxy is added by istio sidecar injector. And only when pod is created, the istio sidecar injector can add new containers.
The controller watches the change of notebook resource and creates the correlated statefulset. The controller can only find one container when it generate statefulset definitions. FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also target the port based off the name notebook-port
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry I did't know that 😊. Thanks for the information
@yanniszark Is there still an appetite for landing this? I just encoutered a case where the notebook's server crashed on startup and never came alive. It'd be great to have a liveness probe. |
Add liveness and readness probe for jupyter notebook
Fixed: #5282
Signed-off-by: MartinForReal fanshangxiang@gmail.com