Skip to content

Commit

Permalink
feat: update notebook server images (#7590)
Browse files Browse the repository at this point in the history
* feat: update example notebook server images

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

* feat: move `01-copy-tmp-home` to base image

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

* fix: rstudio HOME_TMP copy permissions

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

* feat: update `torch` and `tensorflow`

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

* feat: manually install `tensorrt`

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

---------

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
  • Loading branch information
thesuperzapper authored May 27, 2024
1 parent 8869f22 commit 3f7ecfc
Show file tree
Hide file tree
Showing 25 changed files with 283 additions and 185 deletions.
140 changes: 119 additions & 21 deletions components/example-notebook-servers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,32 +60,45 @@ Dockerfile | Container Registry | Notes
[`./jupyter-tensorflow-cuda`](./jupyter-tensorflow-cuda) | [`kubeflownotebookswg/jupyter-tensorflow-cuda`](https://hub.docker.com/r/kubeflownotebookswg/jupyter-tensorflow-cuda) | JupyterLab + TensorFlow + CUDA
[`./jupyter-tensorflow-cuda-full`](./jupyter-tensorflow-cuda-full) | [`kubeflownotebookswg/jupyter-tensorflow-cuda-full`](https://hub.docker.com/r/kubeflownotebookswg/jupyter-tensorflow-cuda-full) | JupyterLab + TensorFlow + CUDA + Common Packages

## Custom Images
## Package Installation

Packages installed by users __after spawning__ a Kubeflow Notebook will only last the lifetime of the pod (unless installed into a PVC-backed directory).

To ensure packages are preserved throughout Pod restarts users will need to either:

1. Build custom images that include them, or
1. Build [custom images](#custom-images) that include them, or
2. Ensure they are installed in a PVC-backed directory

## Custom Images

You can build your own custom images to use with Kubeflow Notebooks.

The easiest way to ensure your custom image meets the [requirements](#image-requirements) is to extend one of our [base images](#base-images).

### Image Requirements

For a container image to work with Kubeflow Notebooks, it must:

- expose an HTTP interface on port `8888`:
- kubeflow sets an environment variable `NB_PREFIX` at runtime with the URL path we expect the container be listening under
- kubeflow uses IFrames, so ensure your application sets `Access-Control-Allow-Origin: *` in HTTP response headers
- run as a user called `jovyan`:
- the home directory of `jovyan` should be `/home/jovyan`
- the UID of `jovyan` should be `1000`
- start successfully with an empty PVC mounted at `/home/jovyan`:
- kubeflow mounts a PVC at `/home/jovyan` to keep state across Pod restarts

### Install Python Packages

You may extend one of the images and install any `pip` or `conda` packages your Kubeflow Notebook users are likely to need.

As a guide, look at [`./jupyter-pytorch-full/Dockerfile`](./jupyter-pytorch-full/Dockerfile) for a `pip install ...` example, and the [`./rstudio-tidyverse/Dockerfile`](./rstudio-tidyverse/Dockerfile) for `conda install ...`.

> __NOTE:__
>
> A common cause of errors is users running `pip install --user ...`, causing the home-directory (which is backed by a PVC) to contain a different or incompatible version of a package contained in `/opt/conda/...`
A common cause of errors is users running `pip install --user ...`, causing the home-directory (which is backed by a PVC) to contain a different or incompatible version of a package contained in `/opt/conda/...`

### Install Linux Packages

You may extend one of the images and install any `apt-get` packages your Kubeflow Notebook users are likely to need.

> __NOTE:__
>
> Ensure you swap to `root` in the Dockerfile before running `apt-get`, and swap back to `$NB_USER` after.
Ensure you swap to `root` in the Dockerfile before running `apt-get`, and swap back to `$NB_USER` after.

### Configure S6 Overlay

Expand All @@ -107,25 +120,110 @@ The purpose of this script is to snapshot any `KUBERNETES_*` environment variabl

Extra services to be monitored by `s6-overlay` should be placed in their own folder under `/etc/services.d/` containing a script called `run` and optionally a finishing script `finish`.

An example of a service can be found in the `run` script of [`.jupyter/s6/services.d/jupyterlab`](./jupyter/s6/services.d/jupyterlab) which is used to start JupyterLab itself.
For more information about the `run` and `finish` scripts, please see the [s6-overlay documentation](https://github.com/just-containers/s6-overlay#writing-a-service-script).

An example of a service can be found in the `run` script of [jupyter/s6/services.d/jupyterlab](jupyter/s6/services.d/jupyterlab) which is used to start JupyterLab itself.

#### Run Services As Root

There may be cases when you need to run a service as root, to do this, you can change the Dockerfile to have `USER root` at the end, and then use `s6-setuidgid` to run the user-facing services as `$NB_USER`.

> __NOTE:__
>
> Our example images run `s6-overlay` as `$NB_USER` (not `root`), meaning any files or scripts related to `s6-overlay` must be owned by the `$NB_USER` user to successfully run.
Our example images run `s6-overlay` as `$NB_USER` (not `root`), meaning any files or scripts related to `s6-overlay` must be owned by the `$NB_USER` user to successfully run.

## Build Images

The server images depend on each other, so you MUST build them in the correct order.

### Build Single Image

You can build a single image (and its dependencies) by running `make` commands in its directory.

For example, to build the [`./jupyter-scipy`](./jupyter-scipy) image:

```bash
# from the root of the repository
cd components/example-notebook-servers/jupyter-scipy

# optionally define a version tag
# default: sha-{GIT_COMMIT}{GIT_TREE_STATE}
#export TAG="X.Y.Z-patch.N"

# configure the image registry
# full image name: {REGISTRY}/{IMAGE_NAME}:{TAG}
export REGISTRY="docker.io/MY_USERNAME"

# build and push (current CPU architecture)
make docker-build-dep
make docker-push-dep
```

To build the image for different CPU architectures, you can use the following commands:

```bash
# from the root of the repository
cd components/example-notebook-servers/jupyter-scipy

# optionally define a version tag
#export TAG="X.Y.Z-patch.N"

# configure the image registry
export REGISTRY="docker.io/MY_USERNAME"

# define the image build cache
# - sets the following build args:
# cache-from: type=registry,ref={CACHE_IMAGE}:{IMAGE_NAME}
# cache-to: type=registry,ref={CACHE_IMAGE}:{IMAGE_NAME},mode=max
# - currently, this is a requirement for multi-arch builds.
# you won't have access to push to the upstream cache,
# so you will need to set your own cache image.
export CACHE_IMAGE="ghcr.io/kubeflow/kubeflow/notebook-servers/build-cache"

# define the architectures to build for
export ARCH="linux/amd64,linux/arm64"

# build and push (multiple CPU architectures)
# requires that `docker buildx` is configured
make docker-build-push-multi-arch-dep
```

### Build All Images

You can build all images (in the correct order) by running a `make` command in the root of this directory.

## Troubleshooting
For example, to build all images:

### Jupyter
```bash
# from the root of the repository
cd components/example-notebook-servers

#### Kernel stuck in `connecting` state:
# optionally define a version tag
#export TAG="X.Y.Z-patch.N"

This is a problem that occurs from time to time and is not a Kubeflow problem, but rather a browser.
It can be identified by looking in the browser error console, which will show errors regarding the websocket not connecting.
To solve the problem, please restart your browser or try using a different browser.
# configure the image registry
export REGISTRY="docker.io/MY_USERNAME"

# build and push (current CPU architecture)
make docker-build
make docker-push
```

To build the images for different CPU architectures, you can use the following commands:

```bash
# from the root of the repository
cd components/example-notebook-servers

# optionally define a version tag
#export TAG="X.Y.Z-patch.N"

# configure the image registry
export REGISTRY="docker.io/MY_USERNAME"

# define the image build cache
export CACHE_IMAGE="ghcr.io/kubeflow/kubeflow/notebook-servers/build-cache"

# define the architectures to build for
export ARCH="linux/amd64,linux/arm64"

# build and push (multiple CPU architectures)
make docker-build-push-multi-arch
```
20 changes: 16 additions & 4 deletions components/example-notebook-servers/base/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,20 @@ ENV NB_UID 1000
ENV NB_PREFIX /
ENV HOME /home/$NB_USER
ENV SHELL /bin/bash
# Might be needed for slow storage see https://github.com/pi-hole/docker-pi-hole/pull/1192
# Value is in milliseconds and 300000 is 5 minutes

# we copy the contents of $HOME_TMP to $HOME on startup
# this is to work around the fact that a PVC will be mounted to $HOME
# but we still want to have some default files in $HOME
# see `s6/cont-init.d/01-copy-tmp-home`
ENV HOME_TMP /tmp_home/$NB_USER

# s3 only gives 5 seconds by default, which is too small for slow PVC storage backends
# when running `/etc/cont-inid.d/01-copy-tmp-home` (note, this is in milliseconds)
ENV S6_CMD_WAIT_FOR_SERVICES_MAXTIME 300000

# args - software versions
ARG KUBECTL_VERSION=v1.27.6
ARG S6_VERSION=v3.1.5.0
ARG KUBECTL_VERSION=v1.27.14
ARG S6_VERSION=v3.1.6.2

# set shell to bash
SHELL ["/bin/bash", "-c"]
Expand Down Expand Up @@ -79,7 +86,9 @@ RUN curl -fsSL "https://dl.k8s.io/release/${KUBECTL_VERSION}/bin/linux/${TARGETA
# create user and set required ownership
RUN useradd -M -s /bin/bash -N -u ${NB_UID} ${NB_USER} \
&& mkdir -p ${HOME} \
&& mkdir -p ${HOME_TMP} \
&& chown -R ${NB_USER}:users ${HOME} \
&& chown -R ${NB_USER}:users ${HOME_TMP} \
&& chown -R ${NB_USER}:users /usr/local/bin

# set locale configs
Expand All @@ -89,6 +98,9 @@ ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
ENV LC_ALL en_US.UTF-8

# s6 - copy scripts
COPY --chown=${NB_USER}:users --chmod=755 s6/ /etc

USER $NB_UID

ENTRYPOINT ["/init"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/command/with-contenv bash

# the home directory is usually a PVC
# we need to copy the contents of $HOME_TMP that we populated during the build
# NOTE: -n prevents overwriting existing files
# NOTE: -T ensures that the CONTENTS of $HOME_TMP are copied, and not the directory itself
echo "INFO: Copying contents of '$HOME_TMP' to '$HOME'..."
cp -n -r -T "$HOME_TMP" "$HOME"
17 changes: 8 additions & 9 deletions components/example-notebook-servers/codeserver-python/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ ARG TARGETARCH
USER root

# args - software versions
ARG CODESERVER_PYTHON_VERSION=2023.18.0
ARG MINIFORGE_VERSION=23.3.1-1
ARG PIP_VERSION=23.2.1
ARG PYTHON_VERSION=3.11.6
ARG CODESERVER_PYTHON_VERSION=2024.4.1
ARG MINIFORGE_VERSION=24.3.0-0
ARG PIP_VERSION=24.0
ARG PYTHON_VERSION=3.11.9

# setup environment for conda
ENV CONDA_DIR /opt/conda
Expand Down Expand Up @@ -60,8 +60,7 @@ RUN code-server --install-extension "ms-python.python@${CODESERVER_PYTHON_VERSIO
&& code-server --list-extensions --show-versions

# s6 - 01-copy-tmp-home
USER root
RUN mkdir -p /tmp_home \
&& cp -r ${HOME} /tmp_home \
&& chown -R ${NB_USER}:users /tmp_home
USER $NB_UID
# NOTE: the contents of $HOME_TMP are copied to $HOME at runtime
# this is a workaround because a PVC will be mounted at $HOME
# and the contents of $HOME will be hidden
RUN cp -r -T "${HOME}" "${HOME_TMP}"
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
# kubeflow packages
kfp==2.4.0
kfp-server-api==2.0.3
kfp==2.7.0
7 changes: 1 addition & 6 deletions components/example-notebook-servers/codeserver/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ FROM $BASE_IMG
ARG TARGETARCH

# args - software versions
ARG CODESERVER_VERSION=v4.17.1
ARG CODESERVER_VERSION=v4.89.1

USER root

Expand All @@ -20,11 +20,6 @@ RUN curl -fsSL "https://github.com/coder/code-server/releases/download/${CODESER
# s6 - copy scripts
COPY --chown=${NB_USER}:users --chmod=755 s6/ /etc

# s6 - 01-copy-tmp-home
RUN mkdir -p /tmp_home \
&& cp -r ${HOME} /tmp_home \
&& chown -R ${NB_USER}:users /tmp_home

USER $NB_UID

EXPOSE 8888

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,17 @@ FROM $BASE_IMG
# install - conda packages
# NOTE: we use mamba to speed things up
RUN mamba install -y -q \
bokeh==3.2.2 \
bokeh==3.3.4 \
cloudpickle==2.2.1 \
dill==0.3.7 \
ipympl==0.9.3 \
matplotlib==3.8.0 \
pandas==2.1.1 \
dill==0.3.8 \
ipympl==0.9.4 \
matplotlib==3.8.4 \
numpy==1.24.4 \
pandas==2.1.4 \
scikit-image==0.22.0 \
scikit-learn==1.3.1 \
scikit-learn==1.3.2 \
scipy==1.11.3 \
seaborn==0.13.0 \
seaborn==0.13.2 \
xgboost==1.7.6 \
&& mamba clean -a -f -y

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# kubeflow packages
kfp==2.4.0
kfp-server-api==2.0.3
kfp==2.7.0

# jupyterlab extensions
jupyterlab-git==0.44.0
jupyterlab-git==0.50.1
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ ARG BASE_IMG=<jupyter>
FROM $BASE_IMG

# args - software versions
ARG PYTORCH_VERSION=2.1.0
ARG TORCHAUDIO_VERSION=2.1.0
ARG TORCHVISION_VERSION=0.16.0
ARG PYTORCH_VERSION=2.3.0
ARG TORCHAUDIO_VERSION=2.3.0
ARG TORCHVISION_VERSION=0.18.0

# nvidia container toolkit
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html
Expand All @@ -19,8 +19,8 @@ ENV NVIDIA_REQUIRE_CUDA "cuda>=12.1"
# install - pytorch (cuda)
RUN python3 -m pip install --quiet --no-cache-dir --index-url https://download.pytorch.org/whl/cu121 \
torch==${PYTORCH_VERSION} \
torchvision==${TORCHVISION_VERSION} \
torchaudio==${TORCHAUDIO_VERSION}
torchaudio==${TORCHAUDIO_VERSION} \
torchvision==${TORCHVISION_VERSION}

# install - requirements.txt
COPY --chown=${NB_USER}:users requirements.txt /tmp
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,17 @@ FROM $BASE_IMG
# install - conda packages
# NOTE: we use mamba to speed things up
RUN mamba install -y -q \
bokeh==3.2.2 \
bokeh==3.3.4 \
cloudpickle==2.2.1 \
dill==0.3.7 \
ipympl==0.9.3 \
matplotlib==3.8.0 \
pandas==2.1.1 \
dill==0.3.8 \
ipympl==0.9.4 \
matplotlib==3.8.4 \
numpy==1.24.4 \
pandas==2.1.4 \
scikit-image==0.22.0 \
scikit-learn==1.3.1 \
scikit-learn==1.3.2 \
scipy==1.11.3 \
seaborn==0.13.0 \
seaborn==0.13.2 \
xgboost==1.7.6 \
&& mamba clean -a -f -y

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# kubeflow packages
kfp==2.4.0
kfp-server-api==2.0.3
kfp==2.7.0

# jupyterlab extensions
jupyterlab-git==0.44.0
jupyterlab-git==0.50.1
Loading

0 comments on commit 3f7ecfc

Please sign in to comment.