-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Dockerfile (GPU Support, Workdir, Permissions) #1313
Conversation
Update Dockerfile to Use NVIDIA's TensorFlow Container
@felixdittrich92 can you add a reviewer? |
Hi @ffalkenberg 👋, Thanks a lot for the PR. Added @odulcy-mindee because he works currently on the requested docker features :) |
Hey @ffalkenberg ! Thank you for the PR! 👍 |
Dockerfile
Outdated
@@ -1,19 +1,20 @@ | |||
FROM python:3.8-slim | |||
FROM nvcr.io/nvidia/tensorflow:22.10.1-tf2-py3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using this image, but it's quite heavy:
# docker image ls
nvcr.io/nvidia/tensorflow 22.10.1-tf2-py3 bc0bd3236830 10 months ago 14.4GB
tensorflow/tensorflow 2.12.0-gpu cbe3a4f4c2a0 5 months ago 6.8GB
especially when compared to the TensorFlow image with GPU support.
Additionally, TensorFlow is already included in this image. However, below, you are using a virtualenv
, which will result in a reinstallation of TensorFlow within the virtual environment. Also, there is no way to enforce properly a TensorFlow version here.
Your suggestion to use another image with Nvidia drivers is good, but I would like to explore alternative images or consider creating one ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@odulcy-mindee, the image does indeed seem oversized. I initially just picked the first image that supported TensorFlow GPU and CUDA. If the mentioned tensorflow-gpu image meets our needs (gpu workload using cuda), it's preferable.
From my experience, when crafting your own image though, starting with a CUDA image is key.
Dockerfile
Outdated
&& rm -rf /root/.cache/pip | ||
&& rm -rf /var/lib/apt/lists/* | ||
|
||
RUN python -m venv /app/venv \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use a virtual environment, indeed, but as it is docker container, I think it's still fine to install directly in the system.
WDYT @ffalkenberg @felixdittrich92 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@odulcy-mindee Thank you for your insights on the use of virtual environments within Docker. I introduced the venv
primarily to address a specific challenge I encountered. During runtime, I needed to install additional packages inside the container, such as Streamlit for demonstrations. Without the virtual environment, I ran into permission issues due to not being root.
If there's an alternative approach that allows for this flexibility without the need for a venv
, I'm absolutely open to it. I'm always in favor of simplifying our setup while ensuring functionality
@odulcy-mindee |
@felixT2K yeah, good question. It'll depends if we can fit everything in one Dockerfile and then use Personally, I'd like to have only one Dockerfile that the CI uses but still usable by the users so they can build the docker container locally if needed |
This sounds much better for me @ffalkenberg would it be an option ? |
We can take some inspiration from https://github.com/tensorflow/build/tree/master/tensorflow_runtime_dockerfiles for TensorFlow. I'm looking for a similar resource for PyTorch. |
https://github.com/cnstark/pytorch-docker/blob/main/docker/ubuntu/Dockerfile or |
@felixT2K Ah, good idea! 👍 |
yes that sounds good to me, you can push changes to my branch or close this and create a new branch? Either way is fine for me😊 |
Do you not want to work on it ? 🤗 |
sure, as soon as i have some free time |
alright no stress :) |
tensorflow/tensorflow:latest-gpu as base image removed venv again
I've now made these changes:
Please review the changes and let me know if there are any concerns or suggestions. Thanks! |
Hi @ffalkenberg 👋, Afterwards we can add a CI job to push prebuild images for TF/PT to docker hub (in another PR) |
@felixdittrich92 Your proposal certainly introduces broader flexibility to the Docker image configuration, but I believe it extends beyond the scope of this pull request. My concerns are:
Given your approach, it might be more fitting to construct the image from scratch, integrating elements like CUDA, Python, Torch, etc. However, juxtaposing this with the simplicity of the previous Dockerfile, I'm unsure if this complexity is warranted. To strike a balance, I can offer to rename the current Dockerfile to |
@felixdittrich92 I'm fine to apply this modification to this file, it's good enough. I'm working on a more generic Dockerfile in #1322. |
Fine for me it's your topic/task ^^ |
Codecov Report
@@ Coverage Diff @@
## main #1313 +/- ##
=======================================
Coverage 95.75% 95.75%
=======================================
Files 154 154
Lines 6902 6902
=======================================
Hits 6609 6609
Misses 293 293
Flags with carried forward coverage won't be shown. Click here to find out more. |
@odulcy-mindee you can merge this if you are fine with the changes. |
Yeah, I totally agree with you. I'm still not entirely sure whether there will be just one file, but I'm working on it. Maybe one file, or a folder So maybe this Dockerfile will be obsolote in few days but I'm fine to release this modification if it can help people right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ffalkenberg for your contribution ! 🚀
This PR updates the Dockerfile to leverage NVIDIA's TensorFlow container, ensuring better GPU support, changes the sources and creates a workdir.
Changes:
Base Image Update:
FROM python:3.8-slim
FROM nvcr.io/nvidia/tensorflow:22.10.1-tf2-py3
The base image has been updated to use NVIDIA's TensorFlow container. This change ensures that the container is optimized for GPU usage and includes all necessary dependencies for TensorFlow to utilize NVIDIA GPUs.
Environment Variables:
ENV DOCTR_CACHE_DIR=/app/.cache
to specify a cache directory.ENV PATH=/app/venv/bin:$PATH
to include the virtual environment's bin directory in the PATH.Working Directory:
WORKDIR /app
The working directory has been set to
/app
to provide a consistent location for application files.File Copying:
/tmp
./app
directory withCOPY . .
.Permissions:
chmod -R a+w /app
Permissions have been updated to ensure that all users have write access to the
/app
directory. This change is crucial as some environments, like Kubernetes/OpenShift, do not allow containers to run as root. By adjusting the permissions, we ensure compatibility with such environments.Additional Information:
General-Purpose Usage: The updated Dockerfile is more versatile due to the inclusion of additional sources. It's designed to be general-purpose, catering to a broader range of applications and supporting both CPU and GPU environments.
CUDA & cuDNN Dependency: The new base image from NVIDIA's TensorFlow container comes with CUDA and cuDNN pre-installed, ensuring that TensorFlow can leverage GPU acceleration out of the box. More details can be found on TensorFlow's official documentation and NVIDIA's container catalog.
Demo App: With the inclusion of more sources in the Dockerfile, running applications like the demo app becomes more straightforward and optimized.