-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with the current Go notebook controller #2456
Comments
Since container command/args are coupled with image, we think it doesn't make sense to provide default values for command/args. |
Thanks, didn't notice this. Do you mean our default value aren't set, or the user override values are not set? I will test it later...
Makes sense, we should allow port and jupterlab to be customized.
I didn't put ambassador routes because we are migrating to istio. #2261 |
Can the defaults be encoded in the docker image? e.g. any command line arguments, environment variables that should be set by default should be set in the docker image. My expectation is that most jupyter images are configured to start Jupyter by default just via
How could the controller pick sensible defaults for things like the command line and arguments? This will depend on the Jupyter image. So perhaps the problem is not with the controller but with the fact that our existing Jupyter images don't set a default entrypoint and args in the container image? |
I understand that users will use the webapp most of the time, but I argue against the controller depending on it. The arguments do not depend on the image really, they depend on the fact that we are starting
True, but they are all going to be Jupyter images. So, expecting a specific command (jupyter) to exist at a certain location, and accept certain Jupyter-specific arguments is a valid assumption for a controller that manages containers running Jupyter notebooks. This goes hand-in-hand with this argument:
Yes, the user can always set the environment variables by hand by overriding the pod spec in their submission, but I think it makes sense for the Jupyter controller to set environment variables that it knows the Jupyter software respects. And it makes sense for the Notebook CRD to have Jupyter-specific fields outside the Pod spec, e.g.,
This is a good argument! And I agree that the container images should be self-standing. If I start the container image, just by running But until all images are like this, I think it makes sense that the controller has a default pod template associated with it, and in this template the administrator specifies a default image, and a default command, and defaults args in this template, so submissions of very simple Notebooks CRs work. |
I had a similar issue with the tensorboard component where the port was a parameter arg but fixed in other places including in the docker file. If you provide command, args in the CRD you can avoid this by never using the docker ENTRYPOINT. But we should try and be consistent and compatible with kustomize. My 2¢ |
i can think of two options
Where does kustomize come into play? Doesn't kustomize provide the same flexibility to override command or args? |
yes - it was more of a comment about leveraging the same things that kustomize uses like overlays on the command and args so moving to kustomize at a later point is easy. |
@jlewi thanks for the feedback. Here's what I don't get:
Do we expect the user to always pass a full pod spec as part of the Notebook they submit, or do we expect them to pass only the parts they want to override? To put it differently, I think it makes sense for the Controller to always have a base pod spec, from which it starts. This could be hardcoded in the controller, or even better, could be part of a ConfigMap, set by the administrator, mounted by the Controller. If this happens, then I think the Controller shouldn't have to worry about Image, command, and args. In this case, whatever the user has to submit becomes minimal, which I think is always a good thing. So, there's two different scenarios:
Having a base pod spec that can boot no matter what would make sense in any case. Because then we could start with an almost empty Notebook CR, and still have a working notebook. What do you think? |
I think this is also aligned with what @kkasravi describes with his reference to kustomize: The controller is a mechanism that starts with a base pod, which is self-standing, ideally, and applies the customizations that come from the Notebook CR that the user submitted. The user can omit fields they may not care about [command, args], and they will be inherited from the base pod spec, so the end result is always valid to submit to the K8s API. |
I know I am repeating myself, but aren't we expecting most users to use webapp? So this isn't making user experience hard.
But see our default cmd:
It's assuming start.sh, not just jupyter. So I think this makes more sense:
Users will still have a easy way to launch notebook with Webapp by just selecting the image. |
Hello Lun-Kai, You are right, but I am making the argument that the abstraction of submitting Notebook CRs based on a Notebook CRD and having them managed by a Notebook Controller, is a good abstraction on its own. Yes, most people will use the webapp, but it makes sense to keep the abstraction as clean as possible at the level of submitting a Notebook CR. I mean, what do we gain, by having the webapp submit magic arguments in the Notebook CR?
A similar argument can be made for So, it's not obvious to me why this wouldn't work: Or, to put it differently: What is it that makes the Notebook Controller actually be a Notebook controller, and not a generic mechanism to submit Pod manifests? I think it is the fact that the Controller knows that it is orchestrating pods that run Images that contain jupyter, and passes arguments to the jupyter command based on fields of the Notebook CRD. |
I actually think not setting the default args is cleaner and the better abstraction.
I think we agree(?) that the whether we set default arg or not doesn't affect the UX of 99% of our users. So the opposite question can be asked as "what do we gain, by having the controller setting the default args"
Doesn't this mean the default arg only works for kubeflow-built images?
I think the controller won't depend on this; it will depend on UseLab?
I don't think it's only about setting arg for the pod. The controller will also secure the access, and do TTL for the notebook. |
@lluunn We had a very nice discussion with @jlewi [thanks! ], it is all logged here: I'll try to summarize in this comment, @jlewi please correct whatever I may have missed: We agree that if all container images have a proper entrypoint [command/args] specified, then we're good, and everything will work, and we will avoid the ugly magic argument list in the web app, so we'll go forward with taking this as granted. There is also the issue of how the controller will know what port the image is set up to listen at, and we agreed that the Controller should have a default value for This seems like a way for us to move forward. My only reservation would be: [it doesn't apply at this point, but we can keep it and use it as initial context if/when we revisit this]: I agree that having the default value for containerPort as a command-line argument to the controller will work. But if we see that we start gathering more command-line arguments for default values for things that are specified as part of a pod spec, then it may make sense to have a "base pod spec" containing all of these defaults, instead of different command-line arguments, and provide this base pod spec as part of a ConfigMap to the Controller. |
@lluunn @jlewi
For (1), I see #2458, which waits for this issue, #2456, but what is missing for this issue to close? For point (2), it seems that #2463 does not pass the target port in the Ambassador mapping, so this still blocks #2456. Is there anything else we should keep in mind, to move forward with #2456 and #2458? |
The service's port will match 80 to 8888 (or container port specified), and ambassador maps route to service, so IIUC ambassador annotation should not need the containerPort? |
Hey Lun-Kai, I think you are referring to the fact that if you
I know it doesn't make sense, but from what we have seen, you have to specify the target port in the mapping explicitly. And if I understand correctly, it doesn't really matter what Service you annotate with the mapping. It doesn't make much sense, maybe @jlewi has some more insight on this, but this is what we've seen, please correct us if we're wrong! |
I see, didn't know that. Will also fix in #2463 |
The issues here are fixed. Other improvement in #2417 |
I am currently developing according to OpenShift and Nvidia GPU Daemon Plugin specs from here: https://github.com/zvonkok/origin-ci-gpu/ The Pod template should also have custom environment variables for GPU acceleration, such as
In my case, a spawned GPU container showed the following when checking the logs oc logs jupyter-mlk8user | grep -b1 -a10 CUDA
|
Hello @lluunn @jlewi
@kimwnasptd , @ioandr and myself are playing with the new Jupyter web app to make it work end-to-end with the new Go-based notebook controller, so we can then proceed with a final review and merge of #2357.
I am writing to summarize a few things we bumped into, it's a single GitHub issue to keep the discussion focused, but If you feel it's best to open distinct issues for them, please let us know.
containerPort
, the user is not required to provide it with in the submitted Notebook CR, but they can always override it. If I understand correctly, the controller can access this value asNotebook.spec.template.spec.containers[0].containerPort
. The controller should accept an explicit port choice by the user, because they know better what their presumably custom Image does, but should not require it.use_websocket: true
because the Jupyter UI requires it and breaks without it.
JUPYTER_ENABLE_LAB
env var toTRUE
(line 116). This is a more philosophical question, better suited to notebook controller Add CRD validation #2417 , but it would make sense for the user to specify some Jupyter-specific settings inside the Notebook CR directly. Yes, the user could just specify environment variables as part of the Pod spec, but it makes sense to simplify setting some Jupyter-specific things, because the Notebook CRD is a Jupyter-specific CRD.And finally, a bigger issue:
It seems the webapp must specify the Image, command, and arguments to the command explicitly when creating a new Notebook CR. I understand the stated goal of allowing the user to work with arbitrary images #2208, but making these arguments mandatory makes the common case rather difficult:
Why would a simple user have to specify the full notebook Image, and the command to run, along with its arguments, in every single YAML that they deploy?
Given how elaborate the argument list currently is, if I understand correctly, something similar to:
This is taken from @kkasravi 's metacontroller, but it seems to be missing from the current Go implementation.
We cannot expect the user to specify this magic string inside the Notebook YAML that they submit.
But they can always override this, if they really know what they're doing. For example, if they actually override the
--port=8888
argument, they'd better also modify thecontainerPort
field of the Pod spec, so the controller knows how to create the Ambassador mapping, see above.The Notebook controller already knows it's handling Jupyter notebooks. So it makes sense to take certain things for granted when deploying a Pod based on an Image containing the Jupyter software, e.g., that the Jupyter software is installed at a certain default location, and takes certain arguments.
So, an alternative would be: Would it make sense to have the Notebook controller use sane defaults for the Image, the command, its arguments, and the containerPort to connect to? This way, the user just submits a super-minimal Notebook YAML, overriding maybe just the resources they need.
Actually, since all these are part of the Pod template, would it make sense for the administrator, who deploys the Controller, to give it a default Pod template containing these values, as part of a ConfigMap?
The text was updated successfully, but these errors were encountered: