Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Dockerfile (GPU Support, Workdir, Permissions) #1313

Merged
merged 2 commits into from
Sep 20, 2023

Conversation

ffalkenberg
Copy link
Contributor

@ffalkenberg ffalkenberg commented Sep 12, 2023

This PR updates the Dockerfile to leverage NVIDIA's TensorFlow container, ensuring better GPU support, changes the sources and creates a workdir.

Changes:

  1. Base Image Update:

    • Old: FROM python:3.8-slim
    • New: FROM nvcr.io/nvidia/tensorflow:22.10.1-tf2-py3

    The base image has been updated to use NVIDIA's TensorFlow container. This change ensures that the container is optimized for GPU usage and includes all necessary dependencies for TensorFlow to utilize NVIDIA GPUs.

  2. Environment Variables:

    • Added ENV DOCTR_CACHE_DIR=/app/.cache to specify a cache directory.
    • Added ENV PATH=/app/venv/bin:$PATH to include the virtual environment's bin directory in the PATH.
  3. Working Directory:

    • New: WORKDIR /app

    The working directory has been set to /app to provide a consistent location for application files.

  4. File Copying:

    • Old: Files were copied to /tmp.
    • New: All files are now copied directly to the /app directory with COPY . ..
  5. Permissions:

    • New: chmod -R a+w /app

    Permissions have been updated to ensure that all users have write access to the /app directory. This change is crucial as some environments, like Kubernetes/OpenShift, do not allow containers to run as root. By adjusting the permissions, we ensure compatibility with such environments.

Additional Information:

  • General-Purpose Usage: The updated Dockerfile is more versatile due to the inclusion of additional sources. It's designed to be general-purpose, catering to a broader range of applications and supporting both CPU and GPU environments.

  • CUDA & cuDNN Dependency: The new base image from NVIDIA's TensorFlow container comes with CUDA and cuDNN pre-installed, ensuring that TensorFlow can leverage GPU acceleration out of the box. More details can be found on TensorFlow's official documentation and NVIDIA's container catalog.

  • Demo App: With the inclusion of more sources in the Dockerfile, running applications like the demo app becomes more straightforward and optimized.

Update Dockerfile to Use NVIDIA's TensorFlow Container
@ffalkenberg ffalkenberg changed the title Update Dockerfile (GPU Support, Workdir) Update Dockerfile (GPU Support, Workdir, Permissions) Sep 12, 2023
@ffalkenberg
Copy link
Contributor Author

@felixdittrich92 can you add a reviewer?

@felixdittrich92
Copy link
Contributor

Hi @ffalkenberg 👋,

Thanks a lot for the PR.

Added @odulcy-mindee because he works currently on the requested docker features :)

Dockerfile Show resolved Hide resolved
@odulcy-mindee
Copy link
Collaborator

Hey @ffalkenberg ! Thank you for the PR! 👍
I'll have a look at it !

Dockerfile Outdated
@@ -1,19 +1,20 @@
FROM python:3.8-slim
FROM nvcr.io/nvidia/tensorflow:22.10.1-tf2-py3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using this image, but it's quite heavy:

# docker image ls
nvcr.io/nvidia/tensorflow                        22.10.1-tf2-py3                            bc0bd3236830   10 months ago   14.4GB
tensorflow/tensorflow                             2.12.0-gpu                                   cbe3a4f4c2a0     5 months ago   6.8GB

especially when compared to the TensorFlow image with GPU support.

Additionally, TensorFlow is already included in this image. However, below, you are using a virtualenv, which will result in a reinstallation of TensorFlow within the virtual environment. Also, there is no way to enforce properly a TensorFlow version here.

Your suggestion to use another image with Nvidia drivers is good, but I would like to explore alternative images or consider creating one ourselves.

Copy link
Contributor Author

@ffalkenberg ffalkenberg Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@odulcy-mindee, the image does indeed seem oversized. I initially just picked the first image that supported TensorFlow GPU and CUDA. If the mentioned tensorflow-gpu image meets our needs (gpu workload using cuda), it's preferable.
From my experience, when crafting your own image though, starting with a CUDA image is key.

Dockerfile Show resolved Hide resolved
Dockerfile Outdated
&& rm -rf /root/.cache/pip
&& rm -rf /var/lib/apt/lists/*

RUN python -m venv /app/venv \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use a virtual environment, indeed, but as it is docker container, I think it's still fine to install directly in the system.
WDYT @ffalkenberg @felixdittrich92 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@odulcy-mindee Thank you for your insights on the use of virtual environments within Docker. I introduced the venv primarily to address a specific challenge I encountered. During runtime, I needed to install additional packages inside the container, such as Streamlit for demonstrations. Without the virtual environment, I ran into permission issues due to not being root.

If there's an alternative approach that allows for this flexibility without the need for a venv, I'm absolutely open to it. I'm always in favor of simplifying our setup while ensuring functionality

@felixT2K
Copy link
Contributor

@odulcy-mindee
But if we create a job for docker hub (as discussed internally) do we still need this single dockerfile ??
Otherwise we should have 4 (tf-gpu / pt-gpu / tf-cpu / pt-cpu) ??

@odulcy-mindee
Copy link
Collaborator

@felixT2K yeah, good question. It'll depends if we can fit everything in one Dockerfile and then use --build-arg to customize it for each version.

Personally, I'd like to have only one Dockerfile that the CI uses but still usable by the users so they can build the docker container locally if needed

@felixT2K
Copy link
Contributor

felixT2K commented Sep 13, 2023

@felixT2K yeah, good question. It'll depends if we can fit everything in one Dockerfile and then use --build-arg to customize it for each version.

Personally, I'd like to have only one Dockerfile that the CI uses but still usable by the users so they can build the docker container locally if needed

This sounds much better for me @ffalkenberg would it be an option ?
Because we have planned to publish multiple images on docker hub so it would be good to have one flexible dockerfile which can be used in the CI to push multible versions (cpu | gpu / tf | pt / py3.8 | py3.9 | py3.10)
This should also match your use case so i would suggest to update your PR depending on the needs :)

@odulcy-mindee
Copy link
Collaborator

We can take some inspiration from https://github.com/tensorflow/build/tree/master/tensorflow_runtime_dockerfiles for TensorFlow. I'm looking for a similar resource for PyTorch.

@felixT2K
Copy link
Contributor

felixT2K commented Sep 13, 2023

We can take some inspiration from https://github.com/tensorflow/build/tree/master/tensorflow_runtime_dockerfiles for TensorFlow. I'm looking for a similar resource for PyTorch.

https://github.com/cnstark/pytorch-docker/blob/main/docker/ubuntu/Dockerfile or
https://github.com/pytorch/serve/blob/master/docker/Dockerfile
with some smaller modifications ?

@odulcy-mindee
Copy link
Collaborator

@felixT2K Ah, good idea! 👍

@ffalkenberg
Copy link
Contributor Author

yes that sounds good to me, you can push changes to my branch or close this and create a new branch? Either way is fine for me😊

@felixT2K
Copy link
Contributor

yes that sounds good to me, you can push changes to my branch or close this and create a new branch? Either way is fine for me😊

Do you not want to work on it ? 🤗

@ffalkenberg
Copy link
Contributor Author

sure, as soon as i have some free time

@felixT2K
Copy link
Contributor

sure, as soon as i have some free time

alright no stress :)

tensorflow/tensorflow:latest-gpu as base image
removed venv again
@ffalkenberg
Copy link
Contributor Author

@odulcy-mindee @felixT2K

I've now made these changes:

  1. Switched to TensorFlow GPU Base Image: I've updated our base image to use the TensorFlow GPU version. This should provide better performance for those with GPU setups, but it's also compatible with CPU-only environments.

  2. Removed the venv Approach: As you mentioned this would work in most usecases. I've made the necessary adjustments to my private deployment.yaml to make additional installs as non-root possible.

  3. Single Dockerfile for the Project: In my opinion, having multiple Dockerfiles for this project might overcomplicate things. The updated Dockerfile is designed to work seamlessly in both CPU and GPU environments.

Please review the changes and let me know if there are any concerns or suggestions. Thanks!

@felixdittrich92
Copy link
Contributor

Hi @ffalkenberg 👋,
Thanks for the update.
We should make it possible to specify the base image and python version via build args because the idea is to use it also for pytorch (TF: CPU/GPU | PT: CPU/GPU) / different python versions :)

Afterwards we can add a CI job to push prebuild images for TF/PT to docker hub (in another PR)

@ffalkenberg
Copy link
Contributor Author

ffalkenberg commented Sep 18, 2023

@felixdittrich92 Your proposal certainly introduces broader flexibility to the Docker image configuration, but I believe it extends beyond the scope of this pull request. My concerns are:

  1. Complexity and Maintenance: Adopting different base images introduces variability due to their unique libraries and tools. This can result in potential inconsistencies and, undoubtedly, a higher testing and maintenance burden.
  2. Purpose and Clarity: I'm not entirely certain of the motivation behind such extensive flexibility. While I acknowledge its potential advantages, I'd suggest considering this enhancement in a separate pull request.

Given your approach, it might be more fitting to construct the image from scratch, integrating elements like CUDA, Python, Torch, etc. However, juxtaposing this with the simplicity of the previous Dockerfile, I'm unsure if this complexity is warranted.

To strike a balance, I can offer to rename the current Dockerfile to gpu.Dockerfile, allowing it to coexist with the original one. This provides a clear distinction for now, and a more comprehensive overhaul can be contemplated by your team at a later stage. Alternatively, if you feel this PR doesn't align with the updated objectives, please feel free to close it.

@odulcy-mindee
Copy link
Collaborator

@felixdittrich92 I'm fine to apply this modification to this file, it's good enough.

I'm working on a more generic Dockerfile in #1322.

@felixdittrich92
Copy link
Contributor

@felixdittrich92 I'm fine to apply this modification to this file, it's good enough.

I'm working on a more generic Dockerfile in #1322.

Fine for me it's your topic/task ^^

@codecov
Copy link

codecov bot commented Sep 18, 2023

Codecov Report

Merging #1313 (f80476c) into main (7ab0ece) will not change coverage.
Report is 2 commits behind head on main.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #1313   +/-   ##
=======================================
  Coverage   95.75%   95.75%           
=======================================
  Files         154      154           
  Lines        6902     6902           
=======================================
  Hits         6609     6609           
  Misses        293      293           
Flag Coverage Δ
unittests 95.75% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 2 files with indirect coverage changes

@felixdittrich92
Copy link
Contributor

@odulcy-mindee you can merge this if you are fine with the changes.
I have only one question if we build more flexible docker files in #1322 whats the reason to keep this docker file in root ?
Personally i have had also in mind to create a dockerfiles folder to store the more generic ones (one TF and one PT) which would make the dockerfile in root obsolete

@odulcy-mindee
Copy link
Collaborator

@odulcy-mindee you can merge this if you are fine with the changes. I have only one question if we build more flexible docker files in #1322 whats the reason to keep this docker file in root ? Personally i have had also in mind to create a dockerfiles folder to store the more generic ones (one TF and one PT) which would make the dockerfile in root obsolete

Yeah, I totally agree with you. I'm still not entirely sure whether there will be just one file, but I'm working on it. Maybe one file, or a folder dockerfiles 🤷‍♂️

So maybe this Dockerfile will be obsolote in few days but I'm fine to release this modification if it can help people right now.

Copy link
Collaborator

@odulcy-mindee odulcy-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ffalkenberg for your contribution ! 🚀

@felixdittrich92 felixdittrich92 merged commit 28e5375 into mindee:main Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: docker Docker-related type: misc Miscellaneous
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants