feat: update notebook server images (#7590)

* feat: update example notebook server images Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com> * feat: move `01-copy-tmp-home` to base image Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com> * fix: rstudio HOME_TMP copy permissions Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com> * feat: update `torch` and `tensorflow` Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com> * feat: manually install `tensorrt` Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com> --------- Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
kubeflow · May 27, 2024 · 3f7ecfc · 3f7ecfc
1 parent 8869f22
commit 3f7ecfc
Show file tree

Hide file tree

Showing 25 changed files with 283 additions and 185 deletions.
diff --git a/components/example-notebook-servers/README.md b/components/example-notebook-servers/README.md
@@ -60,32 +60,45 @@ Dockerfile | Container Registry | Notes
 [`./jupyter-tensorflow-cuda`](./jupyter-tensorflow-cuda) | [`kubeflownotebookswg/jupyter-tensorflow-cuda`](https://hub.docker.com/r/kubeflownotebookswg/jupyter-tensorflow-cuda) | JupyterLab + TensorFlow + CUDA
 [`./jupyter-tensorflow-cuda-full`](./jupyter-tensorflow-cuda-full) | [`kubeflownotebookswg/jupyter-tensorflow-cuda-full`](https://hub.docker.com/r/kubeflownotebookswg/jupyter-tensorflow-cuda-full) | JupyterLab + TensorFlow + CUDA + Common Packages
 
-## Custom Images
+## Package Installation
 
 Packages installed by users __after spawning__ a Kubeflow Notebook will only last the lifetime of the pod (unless installed into a PVC-backed directory).
 
 To ensure packages are preserved throughout Pod restarts users will need to either:
 
-1. Build custom images that include them, or
+1. Build [custom images](#custom-images) that include them, or
 2. Ensure they are installed in a PVC-backed directory
 
+## Custom Images
+
+You can build your own custom images to use with Kubeflow Notebooks.
+
+The easiest way to ensure your custom image meets the [requirements](#image-requirements) is to extend one of our [base images](#base-images).
+
+### Image Requirements
+
+For a container image to work with Kubeflow Notebooks, it must:
+
+- expose an HTTP interface on port `8888`:
+  - kubeflow sets an environment variable `NB_PREFIX` at runtime with the URL path we expect the container be listening under
+  - kubeflow uses IFrames, so ensure your application sets `Access-Control-Allow-Origin: *` in HTTP response headers
+- run as a user called `jovyan`:
+  - the home directory of `jovyan` should be `/home/jovyan`
+  - the UID of `jovyan` should be `1000`
+- start successfully with an empty PVC mounted at `/home/jovyan`:
+  - kubeflow mounts a PVC at `/home/jovyan` to keep state across Pod restarts
+
 ### Install Python Packages
 
 You may extend one of the images and install any `pip` or `conda` packages your Kubeflow Notebook users are likely to need.
-
 As a guide, look at [`./jupyter-pytorch-full/Dockerfile`](./jupyter-pytorch-full/Dockerfile) for a `pip install ...` example, and the [`./rstudio-tidyverse/Dockerfile`](./rstudio-tidyverse/Dockerfile) for `conda install ...`.
 
-> __NOTE:__ 
-> 
-> A common cause of errors is users running `pip install --user ...`, causing the home-directory (which is backed by a PVC) to contain a different or incompatible version of a package contained in  `/opt/conda/...`
+A common cause of errors is users running `pip install --user ...`, causing the home-directory (which is backed by a PVC) to contain a different or incompatible version of a package contained in  `/opt/conda/...`
 
 ### Install Linux Packages
 
 You may extend one of the images and install any `apt-get` packages your Kubeflow Notebook users are likely to need.
-
-> __NOTE:__ 
-> 
-> Ensure you swap to `root` in the Dockerfile before running `apt-get`, and swap back to `$NB_USER` after.
+Ensure you swap to `root` in the Dockerfile before running `apt-get`, and swap back to `$NB_USER` after.
 
 ### Configure S6 Overlay
 
@@ -107,25 +120,110 @@ The purpose of this script is to snapshot any `KUBERNETES_*` environment variabl
 
 Extra services to be monitored by `s6-overlay` should be placed in their own folder under `/etc/services.d/` containing a script called `run` and optionally a finishing script `finish`.
 
+An example of a service can be found in the `run` script of [`.jupyter/s6/services.d/jupyterlab`](./jupyter/s6/services.d/jupyterlab) which is used to start JupyterLab itself.
 For more information about the `run` and `finish` scripts, please see the [s6-overlay documentation](https://github.com/just-containers/s6-overlay#writing-a-service-script).
 
-An example of a service can be found in the `run` script of [jupyter/s6/services.d/jupyterlab](jupyter/s6/services.d/jupyterlab) which is used to start JupyterLab itself.
-
 #### Run Services As Root
 
 There may be cases when you need to run a service as root, to do this, you can change the Dockerfile to have `USER root` at the end, and then use `s6-setuidgid` to run the user-facing services as `$NB_USER`.
 
-> __NOTE:__ 
-> 
-> Our example images run `s6-overlay` as `$NB_USER` (not `root`), meaning any files or scripts related to `s6-overlay` must be owned by the `$NB_USER` user to successfully run.
+Our example images run `s6-overlay` as `$NB_USER` (not `root`), meaning any files or scripts related to `s6-overlay` must be owned by the `$NB_USER` user to successfully run.
+
+## Build Images
+
+The server images depend on each other, so you MUST build them in the correct order.
+
+### Build Single Image
+
+You can build a single image (and its dependencies) by running `make` commands in its directory.
+
+For example, to build the [`./jupyter-scipy`](./jupyter-scipy) image:
+
+```bash
+# from the root of the repository
+cd components/example-notebook-servers/jupyter-scipy
+
+# optionally define a version tag
+#  default: sha-{GIT_COMMIT}{GIT_TREE_STATE}
+#export TAG="X.Y.Z-patch.N"
+
+# configure the image registry
+#  full image name: {REGISTRY}/{IMAGE_NAME}:{TAG}
+export REGISTRY="docker.io/MY_USERNAME"
+
+# build and push (current CPU architecture)
+make docker-build-dep
+make docker-push-dep
+```
+
+To build the image for different CPU architectures, you can use the following commands:
+
+```bash
+# from the root of the repository
+cd components/example-notebook-servers/jupyter-scipy
+
+# optionally define a version tag
+#export TAG="X.Y.Z-patch.N"
+
+# configure the image registry
+export REGISTRY="docker.io/MY_USERNAME"
+
+# define the image build cache
+#  - sets the following build args:
+#     cache-from: type=registry,ref={CACHE_IMAGE}:{IMAGE_NAME}
+#     cache-to:   type=registry,ref={CACHE_IMAGE}:{IMAGE_NAME},mode=max
+#  - currently, this is a requirement for multi-arch builds.
+#    you won't have access to push to the upstream cache,
+#    so you will need to set your own cache image.
+export CACHE_IMAGE="ghcr.io/kubeflow/kubeflow/notebook-servers/build-cache"
+
+# define the architectures to build for
+export ARCH="linux/amd64,linux/arm64"
+
+# build and push (multiple CPU architectures)
+# requires that `docker buildx` is configured
+make docker-build-push-multi-arch-dep
+```
+
+### Build All Images
 
+You can build all images (in the correct order) by running a `make` command in the root of this directory.
 
-## Troubleshooting
+For example, to build all images:
 
-### Jupyter
+```bash
+# from the root of the repository
+cd components/example-notebook-servers
 
-#### Kernel stuck in `connecting` state:
+# optionally define a version tag
+#export TAG="X.Y.Z-patch.N"
 
-This is a problem that occurs from time to time and is not a Kubeflow problem, but rather a browser.
-It can be identified by looking in the browser error console, which will show errors regarding the websocket not connecting.
-To solve the problem, please restart your browser or try using a different browser.
+# configure the image registry
+export REGISTRY="docker.io/MY_USERNAME"
+
+# build and push (current CPU architecture)
+make docker-build
+make docker-push
+```
+
+To build the images for different CPU architectures, you can use the following commands:
+
+```bash
+# from the root of the repository
+cd components/example-notebook-servers
+
+# optionally define a version tag
+#export TAG="X.Y.Z-patch.N"
+
+# configure the image registry
+export REGISTRY="docker.io/MY_USERNAME"
+
+# define the image build cache
+export CACHE_IMAGE="ghcr.io/kubeflow/kubeflow/notebook-servers/build-cache"
+
+# define the architectures to build for
+export ARCH="linux/amd64,linux/arm64"
+
+# build and push (multiple CPU architectures)
+make docker-build-push-multi-arch
+```
diff --git a/components/example-notebook-servers/base/Dockerfile b/components/example-notebook-servers/base/Dockerfile
@@ -13,13 +13,20 @@ ENV NB_UID 1000
 ENV NB_PREFIX /
 ENV HOME /home/$NB_USER
 ENV SHELL /bin/bash
-# Might be needed for slow storage see https://github.com/pi-hole/docker-pi-hole/pull/1192
-# Value is in milliseconds and 300000 is 5 minutes
+
+# we copy the contents of $HOME_TMP to $HOME on startup
+# this is to work around the fact that a PVC will be mounted to $HOME
+# but we still want to have some default files in $HOME
+# see `s6/cont-init.d/01-copy-tmp-home`
+ENV HOME_TMP /tmp_home/$NB_USER
+
+# s3 only gives 5 seconds by default, which is too small for slow PVC storage backends
+# when running `/etc/cont-inid.d/01-copy-tmp-home` (note, this is in milliseconds)
 ENV S6_CMD_WAIT_FOR_SERVICES_MAXTIME 300000
 
 # args - software versions
-ARG KUBECTL_VERSION=v1.27.6
-ARG S6_VERSION=v3.1.5.0
+ARG KUBECTL_VERSION=v1.27.14
+ARG S6_VERSION=v3.1.6.2
 
 # set shell to bash
 SHELL ["/bin/bash", "-c"]
@@ -79,7 +86,9 @@ RUN curl -fsSL "https://dl.k8s.io/release/${KUBECTL_VERSION}/bin/linux/${TARGETA
 # create user and set required ownership
 RUN useradd -M -s /bin/bash -N -u ${NB_UID} ${NB_USER} \
  && mkdir -p ${HOME} \
+ && mkdir -p ${HOME_TMP} \
  && chown -R ${NB_USER}:users ${HOME} \
+ && chown -R ${NB_USER}:users ${HOME_TMP} \
  && chown -R ${NB_USER}:users /usr/local/bin
 
 # set locale configs
@@ -89,6 +98,9 @@ ENV LANG en_US.UTF-8
 ENV LANGUAGE en_US.UTF-8
 ENV LC_ALL en_US.UTF-8
 
+# s6 - copy scripts
+COPY --chown=${NB_USER}:users --chmod=755 s6/ /etc
+
 USER $NB_UID
 
 ENTRYPOINT ["/init"]
diff --git a/components/example-notebook-servers/base/s6/cont-init.d/01-copy-tmp-home b/components/example-notebook-servers/base/s6/cont-init.d/01-copy-tmp-home
@@ -0,0 +1,8 @@
+#!/command/with-contenv bash
+
+# the home directory is usually a PVC
+# we need to copy the contents of $HOME_TMP that we populated during the build
+# NOTE: -n prevents overwriting existing files
+# NOTE: -T ensures that the CONTENTS of $HOME_TMP are copied, and not the directory itself
+echo "INFO: Copying contents of '$HOME_TMP' to '$HOME'..."
+cp -n -r -T "$HOME_TMP" "$HOME"
diff --git a/components/example-notebook-servers/codeserver-python/Dockerfile b/components/example-notebook-servers/codeserver-python/Dockerfile
@@ -10,10 +10,10 @@ ARG TARGETARCH
 USER root
 
 # args - software versions
-ARG CODESERVER_PYTHON_VERSION=2023.18.0
-ARG MINIFORGE_VERSION=23.3.1-1
-ARG PIP_VERSION=23.2.1
-ARG PYTHON_VERSION=3.11.6
+ARG CODESERVER_PYTHON_VERSION=2024.4.1
+ARG MINIFORGE_VERSION=24.3.0-0
+ARG PIP_VERSION=24.0
+ARG PYTHON_VERSION=3.11.9
 
 # setup environment for conda
 ENV CONDA_DIR /opt/conda
@@ -60,8 +60,7 @@ RUN code-server --install-extension "ms-python.python@${CODESERVER_PYTHON_VERSIO
  && code-server --list-extensions --show-versions
 
 # s6 - 01-copy-tmp-home
-USER root
-RUN mkdir -p /tmp_home \
- && cp -r ${HOME} /tmp_home \
- && chown -R ${NB_USER}:users /tmp_home
-USER $NB_UID
+# NOTE: the contents of $HOME_TMP are copied to $HOME at runtime
+#       this is a workaround because a PVC will be mounted at $HOME
+#       and the contents of $HOME will be hidden
+RUN cp -r -T "${HOME}" "${HOME_TMP}"
diff --git a/components/example-notebook-servers/codeserver-python/requirements.txt b/components/example-notebook-servers/codeserver-python/requirements.txt
@@ -1,3 +1,2 @@
 # kubeflow packages
-kfp==2.4.0
-kfp-server-api==2.0.3
+kfp==2.7.0
diff --git a/components/example-notebook-servers/codeserver/Dockerfile b/components/example-notebook-servers/codeserver/Dockerfile
@@ -8,7 +8,7 @@ FROM $BASE_IMG
 ARG TARGETARCH
 
 # args - software versions
-ARG CODESERVER_VERSION=v4.17.1
+ARG CODESERVER_VERSION=v4.89.1
 
 USER root
 
@@ -20,11 +20,6 @@ RUN curl -fsSL "https://github.com/coder/code-server/releases/download/${CODESER
 # s6 - copy scripts
 COPY --chown=${NB_USER}:users --chmod=755 s6/ /etc
 
-# s6 - 01-copy-tmp-home
-RUN mkdir -p /tmp_home \
- && cp -r ${HOME} /tmp_home \
- && chown -R ${NB_USER}:users /tmp_home
-
 USER $NB_UID
 
 EXPOSE 8888
diff --git a/components/example-notebook-servers/codeserver/s6/cont-init.d/01-copy-tmp-home b/components/example-notebook-servers/codeserver/s6/cont-init.d/01-copy-tmp-home
diff --git a/components/example-notebook-servers/jupyter-pytorch-cuda-full/Dockerfile b/components/example-notebook-servers/jupyter-pytorch-cuda-full/Dockerfile
@@ -8,16 +8,17 @@ FROM $BASE_IMG
 # install - conda packages
 # NOTE: we use mamba to speed things up
 RUN mamba install -y -q \
-    bokeh==3.2.2 \
+    bokeh==3.3.4 \
     cloudpickle==2.2.1 \
-    dill==0.3.7 \
-    ipympl==0.9.3 \
-    matplotlib==3.8.0 \
-    pandas==2.1.1 \
+    dill==0.3.8 \
+    ipympl==0.9.4 \
+    matplotlib==3.8.4 \
+    numpy==1.24.4 \
+    pandas==2.1.4 \
     scikit-image==0.22.0 \
-    scikit-learn==1.3.1 \
+    scikit-learn==1.3.2 \
     scipy==1.11.3 \
-    seaborn==0.13.0 \
+    seaborn==0.13.2 \
     xgboost==1.7.6 \
  && mamba clean -a -f -y
 

diff --git a/components/example-notebook-servers/jupyter-pytorch-cuda-full/requirements.txt b/components/example-notebook-servers/jupyter-pytorch-cuda-full/requirements.txt
@@ -1,6 +1,5 @@
 # kubeflow packages
-kfp==2.4.0
-kfp-server-api==2.0.3
+kfp==2.7.0
 
 # jupyterlab extensions
-jupyterlab-git==0.44.0
+jupyterlab-git==0.50.1
diff --git a/components/example-notebook-servers/jupyter-pytorch-cuda/Dockerfile b/components/example-notebook-servers/jupyter-pytorch-cuda/Dockerfile
@@ -6,9 +6,9 @@ ARG BASE_IMG=<jupyter>
 FROM $BASE_IMG
 
 # args - software versions
-ARG PYTORCH_VERSION=2.1.0
-ARG TORCHAUDIO_VERSION=2.1.0
-ARG TORCHVISION_VERSION=0.16.0
+ARG PYTORCH_VERSION=2.3.0
+ARG TORCHAUDIO_VERSION=2.3.0
+ARG TORCHVISION_VERSION=0.18.0
 
 # nvidia container toolkit
 # https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html
@@ -19,8 +19,8 @@ ENV NVIDIA_REQUIRE_CUDA "cuda>=12.1"
 # install - pytorch (cuda)
 RUN python3 -m pip install --quiet --no-cache-dir --index-url https://download.pytorch.org/whl/cu121 \
     torch==${PYTORCH_VERSION} \
-    torchvision==${TORCHVISION_VERSION} \
-    torchaudio==${TORCHAUDIO_VERSION}
+    torchaudio==${TORCHAUDIO_VERSION} \
+    torchvision==${TORCHVISION_VERSION}
 
 # install - requirements.txt
 COPY --chown=${NB_USER}:users requirements.txt /tmp

diff --git a/components/example-notebook-servers/jupyter-pytorch-full/Dockerfile b/components/example-notebook-servers/jupyter-pytorch-full/Dockerfile
@@ -8,16 +8,17 @@ FROM $BASE_IMG
 # install - conda packages
 # NOTE: we use mamba to speed things up
 RUN mamba install -y -q \
-    bokeh==3.2.2 \
+    bokeh==3.3.4 \
     cloudpickle==2.2.1 \
-    dill==0.3.7 \
-    ipympl==0.9.3 \
-    matplotlib==3.8.0 \
-    pandas==2.1.1 \
+    dill==0.3.8 \
+    ipympl==0.9.4 \
+    matplotlib==3.8.4 \
+    numpy==1.24.4 \
+    pandas==2.1.4 \
     scikit-image==0.22.0 \
-    scikit-learn==1.3.1 \
+    scikit-learn==1.3.2 \
     scipy==1.11.3 \
-    seaborn==0.13.0 \
+    seaborn==0.13.2 \
     xgboost==1.7.6 \
  && mamba clean -a -f -y
 

diff --git a/components/example-notebook-servers/jupyter-pytorch-full/requirements.txt b/components/example-notebook-servers/jupyter-pytorch-full/requirements.txt
@@ -1,6 +1,5 @@
 # kubeflow packages
-kfp==2.4.0
-kfp-server-api==2.0.3
+kfp==2.7.0
 
 # jupyterlab extensions
-jupyterlab-git==0.44.0
+jupyterlab-git==0.50.1