Update Dockerfile (GPU Support, Workdir, Permissions) #1313

ffalkenberg · 2023-09-12T09:31:25Z

This PR updates the Dockerfile to leverage NVIDIA's TensorFlow container, ensuring better GPU support, changes the sources and creates a workdir.

Changes:

Base Image Update:
- Old: FROM python:3.8-slim
- New: FROM nvcr.io/nvidia/tensorflow:22.10.1-tf2-py3
The base image has been updated to use NVIDIA's TensorFlow container. This change ensures that the container is optimized for GPU usage and includes all necessary dependencies for TensorFlow to utilize NVIDIA GPUs.
Environment Variables:
- Added ENV DOCTR_CACHE_DIR=/app/.cache to specify a cache directory.
- Added ENV PATH=/app/venv/bin:$PATH to include the virtual environment's bin directory in the PATH.
Working Directory:
- New: WORKDIR /app
The working directory has been set to /app to provide a consistent location for application files.
File Copying:
- Old: Files were copied to /tmp.
- New: All files are now copied directly to the /app directory with COPY . ..
Permissions:
- New: chmod -R a+w /app
Permissions have been updated to ensure that all users have write access to the /app directory. This change is crucial as some environments, like Kubernetes/OpenShift, do not allow containers to run as root. By adjusting the permissions, we ensure compatibility with such environments.

Additional Information:

General-Purpose Usage: The updated Dockerfile is more versatile due to the inclusion of additional sources. It's designed to be general-purpose, catering to a broader range of applications and supporting both CPU and GPU environments.
CUDA & cuDNN Dependency: The new base image from NVIDIA's TensorFlow container comes with CUDA and cuDNN pre-installed, ensuring that TensorFlow can leverage GPU acceleration out of the box. More details can be found on TensorFlow's official documentation and NVIDIA's container catalog.
Demo App: With the inclusion of more sources in the Dockerfile, running applications like the demo app becomes more straightforward and optimized.

Update Dockerfile to Use NVIDIA's TensorFlow Container

ffalkenberg · 2023-09-12T09:42:31Z

@felixdittrich92 can you add a reviewer?

felixdittrich92 · 2023-09-12T10:33:58Z

Hi @ffalkenberg 👋,

Thanks a lot for the PR.

Added @odulcy-mindee because he works currently on the requested docker features :)

Dockerfile

odulcy-mindee · 2023-09-12T11:08:56Z

Hey @ffalkenberg ! Thank you for the PR! 👍
I'll have a look at it !

odulcy-mindee · 2023-09-13T09:55:03Z

Dockerfile

@@ -1,19 +1,20 @@
-FROM python:3.8-slim
+FROM nvcr.io/nvidia/tensorflow:22.10.1-tf2-py3


I tried using this image, but it's quite heavy:

# docker image ls nvcr.io/nvidia/tensorflow 22.10.1-tf2-py3 bc0bd3236830 10 months ago 14.4GB tensorflow/tensorflow 2.12.0-gpu cbe3a4f4c2a0 5 months ago 6.8GB

especially when compared to the TensorFlow image with GPU support.

Additionally, TensorFlow is already included in this image. However, below, you are using a virtualenv, which will result in a reinstallation of TensorFlow within the virtual environment. Also, there is no way to enforce properly a TensorFlow version here.

Your suggestion to use another image with Nvidia drivers is good, but I would like to explore alternative images or consider creating one ourselves.

@odulcy-mindee, the image does indeed seem oversized. I initially just picked the first image that supported TensorFlow GPU and CUDA. If the mentioned tensorflow-gpu image meets our needs (gpu workload using cuda), it's preferable.
From my experience, when crafting your own image though, starting with a CUDA image is key.

Dockerfile

odulcy-mindee · 2023-09-13T10:06:42Z

Dockerfile

-    && rm -rf /root/.cache/pip
+    && rm -rf /var/lib/apt/lists/* 
+
+RUN python -m venv /app/venv \


We can use a virtual environment, indeed, but as it is docker container, I think it's still fine to install directly in the system.
WDYT @ffalkenberg @felixdittrich92 ?

@odulcy-mindee Thank you for your insights on the use of virtual environments within Docker. I introduced the venv primarily to address a specific challenge I encountered. During runtime, I needed to install additional packages inside the container, such as Streamlit for demonstrations. Without the virtual environment, I ran into permission issues due to not being root.

If there's an alternative approach that allows for this flexibility without the need for a venv, I'm absolutely open to it. I'm always in favor of simplifying our setup while ensuring functionality

felixT2K · 2023-09-13T10:15:40Z

@odulcy-mindee
But if we create a job for docker hub (as discussed internally) do we still need this single dockerfile ??
Otherwise we should have 4 (tf-gpu / pt-gpu / tf-cpu / pt-cpu) ??

odulcy-mindee · 2023-09-13T12:01:04Z

@felixT2K yeah, good question. It'll depends if we can fit everything in one Dockerfile and then use --build-arg to customize it for each version.

Personally, I'd like to have only one Dockerfile that the CI uses but still usable by the users so they can build the docker container locally if needed

felixT2K · 2023-09-13T12:23:48Z

@felixT2K yeah, good question. It'll depends if we can fit everything in one Dockerfile and then use --build-arg to customize it for each version.

Personally, I'd like to have only one Dockerfile that the CI uses but still usable by the users so they can build the docker container locally if needed

This sounds much better for me @ffalkenberg would it be an option ?
Because we have planned to publish multiple images on docker hub so it would be good to have one flexible dockerfile which can be used in the CI to push multible versions (cpu | gpu / tf | pt / py3.8 | py3.9 | py3.10)
This should also match your use case so i would suggest to update your PR depending on the needs :)

odulcy-mindee · 2023-09-13T13:23:37Z

We can take some inspiration from https://github.com/tensorflow/build/tree/master/tensorflow_runtime_dockerfiles for TensorFlow. I'm looking for a similar resource for PyTorch.

felixT2K · 2023-09-13T13:38:45Z

We can take some inspiration from https://github.com/tensorflow/build/tree/master/tensorflow_runtime_dockerfiles for TensorFlow. I'm looking for a similar resource for PyTorch.

https://github.com/cnstark/pytorch-docker/blob/main/docker/ubuntu/Dockerfile or
https://github.com/pytorch/serve/blob/master/docker/Dockerfile
with some smaller modifications ?

odulcy-mindee · 2023-09-13T14:31:09Z

@felixT2K Ah, good idea! 👍

ffalkenberg · 2023-09-14T11:06:18Z

yes that sounds good to me, you can push changes to my branch or close this and create a new branch? Either way is fine for me😊

felixT2K · 2023-09-14T11:11:48Z

yes that sounds good to me, you can push changes to my branch or close this and create a new branch? Either way is fine for me😊

Do you not want to work on it ? 🤗

ffalkenberg · 2023-09-14T11:15:05Z

sure, as soon as i have some free time

felixT2K · 2023-09-14T11:15:42Z

sure, as soon as i have some free time

alright no stress :)

tensorflow/tensorflow:latest-gpu as base image removed venv again

ffalkenberg · 2023-09-18T10:32:32Z

@odulcy-mindee @felixT2K

I've now made these changes:

Switched to TensorFlow GPU Base Image: I've updated our base image to use the TensorFlow GPU version. This should provide better performance for those with GPU setups, but it's also compatible with CPU-only environments.
Removed the venv Approach: As you mentioned this would work in most usecases. I've made the necessary adjustments to my private deployment.yaml to make additional installs as non-root possible.
Single Dockerfile for the Project: In my opinion, having multiple Dockerfiles for this project might overcomplicate things. The updated Dockerfile is designed to work seamlessly in both CPU and GPU environments.

Please review the changes and let me know if there are any concerns or suggestions. Thanks!

felixdittrich92 · 2023-09-18T13:57:53Z

Hi @ffalkenberg 👋,
Thanks for the update.
We should make it possible to specify the base image and python version via build args because the idea is to use it also for pytorch (TF: CPU/GPU | PT: CPU/GPU) / different python versions :)

Afterwards we can add a CI job to push prebuild images for TF/PT to docker hub (in another PR)

ffalkenberg · 2023-09-18T14:44:21Z

@felixdittrich92 Your proposal certainly introduces broader flexibility to the Docker image configuration, but I believe it extends beyond the scope of this pull request. My concerns are:

Complexity and Maintenance: Adopting different base images introduces variability due to their unique libraries and tools. This can result in potential inconsistencies and, undoubtedly, a higher testing and maintenance burden.
Purpose and Clarity: I'm not entirely certain of the motivation behind such extensive flexibility. While I acknowledge its potential advantages, I'd suggest considering this enhancement in a separate pull request.

Given your approach, it might be more fitting to construct the image from scratch, integrating elements like CUDA, Python, Torch, etc. However, juxtaposing this with the simplicity of the previous Dockerfile, I'm unsure if this complexity is warranted.

To strike a balance, I can offer to rename the current Dockerfile to gpu.Dockerfile, allowing it to coexist with the original one. This provides a clear distinction for now, and a more comprehensive overhaul can be contemplated by your team at a later stage. Alternatively, if you feel this PR doesn't align with the updated objectives, please feel free to close it.

odulcy-mindee · 2023-09-18T16:30:15Z

@felixdittrich92 I'm fine to apply this modification to this file, it's good enough.

I'm working on a more generic Dockerfile in #1322.

felixdittrich92 · 2023-09-18T16:47:35Z

@felixdittrich92 I'm fine to apply this modification to this file, it's good enough.

I'm working on a more generic Dockerfile in #1322.

Fine for me it's your topic/task ^^

codecov · 2023-09-18T17:04:44Z

Codecov Report

Merging #1313 (f80476c) into main (7ab0ece) will not change coverage.
Report is 2 commits behind head on main.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #1313   +/-   ##
=======================================
  Coverage   95.75%   95.75%           
=======================================
  Files         154      154           
  Lines        6902     6902           
=======================================
  Hits         6609     6609           
  Misses        293      293

Flag	Coverage Δ
unittests	`95.75% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

see 2 files with indirect coverage changes

felixdittrich92 · 2023-09-19T06:26:14Z

@odulcy-mindee you can merge this if you are fine with the changes.
I have only one question if we build more flexible docker files in #1322 whats the reason to keep this docker file in root ?
Personally i have had also in mind to create a dockerfiles folder to store the more generic ones (one TF and one PT) which would make the dockerfile in root obsolete

odulcy-mindee · 2023-09-19T08:44:38Z

@odulcy-mindee you can merge this if you are fine with the changes. I have only one question if we build more flexible docker files in #1322 whats the reason to keep this docker file in root ? Personally i have had also in mind to create a dockerfiles folder to store the more generic ones (one TF and one PT) which would make the dockerfile in root obsolete

Yeah, I totally agree with you. I'm still not entirely sure whether there will be just one file, but I'm working on it. Maybe one file, or a folder dockerfiles 🤷‍♂️

So maybe this Dockerfile will be obsolote in few days but I'm fine to release this modification if it can help people right now.

odulcy-mindee

Thank you @ffalkenberg for your contribution ! 🚀

Update Dockerfile

d4f7e3d

Update Dockerfile to Use NVIDIA's TensorFlow Container

ffalkenberg changed the title ~~Update Dockerfile (GPU Support, Workdir)~~ Update Dockerfile (GPU Support, Workdir, Permissions) Sep 12, 2023

felixdittrich92 assigned odulcy-mindee Sep 12, 2023

felixdittrich92 added topic: docker Docker-related type: misc Miscellaneous labels Sep 12, 2023

felixdittrich92 requested a review from odulcy-mindee September 12, 2023 10:30

felixT2K reviewed Sep 12, 2023

View reviewed changes

Dockerfile Show resolved Hide resolved

odulcy-mindee reviewed Sep 13, 2023

View reviewed changes

Update Dockerfile

f80476c

tensorflow/tensorflow:latest-gpu as base image removed venv again

odulcy-mindee approved these changes Sep 19, 2023

View reviewed changes

felixdittrich92 merged commit 28e5375 into mindee:main Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Dockerfile (GPU Support, Workdir, Permissions) #1313

Update Dockerfile (GPU Support, Workdir, Permissions) #1313

ffalkenberg commented Sep 12, 2023 •

edited

Loading

ffalkenberg commented Sep 12, 2023

felixdittrich92 commented Sep 12, 2023

odulcy-mindee commented Sep 12, 2023

odulcy-mindee Sep 13, 2023

ffalkenberg Sep 13, 2023 •

edited

Loading

odulcy-mindee Sep 13, 2023

felixT2K Sep 13, 2023

ffalkenberg Sep 13, 2023

felixT2K commented Sep 13, 2023

odulcy-mindee commented Sep 13, 2023

felixT2K commented Sep 13, 2023 •

edited

Loading

odulcy-mindee commented Sep 13, 2023

felixT2K commented Sep 13, 2023 •

edited

Loading

odulcy-mindee commented Sep 13, 2023

ffalkenberg commented Sep 14, 2023

felixT2K commented Sep 14, 2023

ffalkenberg commented Sep 14, 2023

felixT2K commented Sep 14, 2023

ffalkenberg commented Sep 18, 2023

felixdittrich92 commented Sep 18, 2023

ffalkenberg commented Sep 18, 2023 •

edited

Loading

odulcy-mindee commented Sep 18, 2023

felixdittrich92 commented Sep 18, 2023

codecov bot commented Sep 18, 2023

felixdittrich92 commented Sep 19, 2023

odulcy-mindee commented Sep 19, 2023

odulcy-mindee left a comment

		@@ -1,19 +1,20 @@
		FROM python:3.8-slim
		FROM nvcr.io/nvidia/tensorflow:22.10.1-tf2-py3

Update Dockerfile (GPU Support, Workdir, Permissions) #1313

Update Dockerfile (GPU Support, Workdir, Permissions) #1313

Conversation

ffalkenberg commented Sep 12, 2023 • edited Loading

ffalkenberg commented Sep 12, 2023

felixdittrich92 commented Sep 12, 2023

odulcy-mindee commented Sep 12, 2023

odulcy-mindee Sep 13, 2023

Choose a reason for hiding this comment

ffalkenberg Sep 13, 2023 • edited Loading

Choose a reason for hiding this comment

odulcy-mindee Sep 13, 2023

Choose a reason for hiding this comment

felixT2K Sep 13, 2023

Choose a reason for hiding this comment

ffalkenberg Sep 13, 2023

Choose a reason for hiding this comment

felixT2K commented Sep 13, 2023

odulcy-mindee commented Sep 13, 2023

felixT2K commented Sep 13, 2023 • edited Loading

odulcy-mindee commented Sep 13, 2023

felixT2K commented Sep 13, 2023 • edited Loading

odulcy-mindee commented Sep 13, 2023

ffalkenberg commented Sep 14, 2023

felixT2K commented Sep 14, 2023

ffalkenberg commented Sep 14, 2023

felixT2K commented Sep 14, 2023

ffalkenberg commented Sep 18, 2023

felixdittrich92 commented Sep 18, 2023

ffalkenberg commented Sep 18, 2023 • edited Loading

odulcy-mindee commented Sep 18, 2023

felixdittrich92 commented Sep 18, 2023

codecov bot commented Sep 18, 2023

Codecov Report

felixdittrich92 commented Sep 19, 2023

odulcy-mindee commented Sep 19, 2023

odulcy-mindee left a comment

Choose a reason for hiding this comment

ffalkenberg commented Sep 12, 2023 •

edited

Loading

ffalkenberg Sep 13, 2023 •

edited

Loading

felixT2K commented Sep 13, 2023 •

edited

Loading

felixT2K commented Sep 13, 2023 •

edited

Loading

ffalkenberg commented Sep 18, 2023 •

edited

Loading