Docker image for test and experiment Keras #3035

dosht · 2016-06-21T11:06:42Z

To avoid issues in installing Theano or CUDA in any Linux distros, This PR is to automate the environment with a docker image for Keras and CUDA that can share the GPUs with the docker container by an Nvidia's script.

Docker image with CUDA support on ubuntu 14.04
nvidia-docker script to forward the GPU to the container
MakeFile to simplify docker commands for build, run, test, ..etc
Add useful tools like jupyter notebook, ipdb, sklearn for experiments

The MakeFile is optional to make docker commands easier, e.g.

$ make test  # runs all tests in the docker container
$ make ipython  # runs ipython shell with current Keras in PYTHONPATH
$ make notebook  # runs jupyter notebook with current Keras in PYTHONPATH

This can be more customizable by choosing a specific GPU and a directory for dataset, e.g.

$ make notebook GPU=0 DATA=/home/fchollet/datasets

I'm using this already for a while and it makes my life easier and will update the README as well if you liked this suggestion.

Thanks,

- Docker image with CUDA support on ubuntu 14.04 - nvidia-docker script to forward the GPU to the container - MakeFile to simplify docker commands for build, run, test, ..etc - Add useful tools like jupyter notebook, ipdb, sklearn for experiments

tboquet · 2016-06-21T14:04:12Z

Dockerfile

+    pip install theano ipdb pytest pytest-cov python-coveralls pytest-xdist pep8 pytest-pep8 && \
+    conda clean -yt
+
+ENV THEANO_FLAGS='mode=FAST_RUN,device=gpu,nvcc.fastmath=True,floatX=float32'


Why not echo a .theanorc file?

I just picked the first method in the docs: https://github.com/fchollet/keras/blob/master/docs/templates/getting-started/faq.md#how-can-i-run-keras-on-gpu
Do you prefer .theanorc ?

tboquet · 2016-06-21T14:05:38Z

Nice addition! Should you update the makefile so it can be used with the nvidia-docker plugin?
The script is deprecated and the plugin is stable right now.

fchollet · 2016-06-21T22:56:07Z

This seems for use exclusively with Theano. Any reason why? TensorFlow is gaining in popularity and will eventually be the default backend for Keras.
Who would take care of maintaining, updating this Docker image? I am not in a position where I can take responsibility for it.
Not everyone is familiar with Docker. I think it would be very important to have a tutorial somewhere on how to use this Docker image to get started using Keras (e.g. using a AWS GPU instance).

dosht · 2016-06-22T16:44:07Z

@fchollet, for the first point, I added Tensorflow as the default backend.
And it can switch to Theano by: make notebook BACKEND=theano
@tboquet, I removed the deprecated nvidia-docker script and added .theanorc

tboquet · 2016-06-22T17:34:29Z

Dockerfile

@@ -0,0 +1,52 @@
+FROM nvidia/cuda:7.5-cudnn4-devel


cuDNN v5 is available 😃 :
FROM nvidia/cuda:7.5-cudnn5-devel

And quite a bit faster too!

fchollet · 2016-06-23T18:37:01Z

Can someone more familiar with Docker than me review this?

tboquet · 2016-06-23T18:51:29Z

I can suggest some small changes but it would be good to have a third pass just to be sure everything is good! Another point, @dosht did you tried to launch the tests using Theano and TensorFlow in the container?

tboquet · 2016-06-23T19:27:21Z

Dockerfile

+    pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0rc0-cp35-cp35m-linux_x86_64.whl && \
+    pip install git+git://github.com/Theano/Theano.git && \
+    pip install ipdb pytest pytest-cov python-coveralls coverage==3.7.1 pytest-xdist pep8 pytest-pep8 && \
+    conda install Pillow scikit-learn notebook pandas matplotlib nose pyyaml six h5py && \


Don't you need to install the libhdf5-dev deb package for h5py to work properly?
Conda will link the compiled binaries but you will maybe have an error because the underlying c library is not there.

henry0312 · 2016-06-24T01:32:34Z

Why do you use conda to install python?
I think we can follow the way in docker-library/python, which is docker official Image for python.
(cf. https://github.com/docker-library/python/blob/master/3.5/Dockerfile)

dosht · 2016-06-26T18:03:00Z

@tboquet When running the tests, I got some failures. most of them are method not found.
When I switched to device = cpu and backend Theano, I got only this failure:

[gw0] linux -- Python 3.5.1 /opt/conda/bin/python
Slave 'gw0' crashed while running 'tests/keras/backend/test_backends.py::TestBackend::()::test_conv2d' 
======== 1 failed, 168 passed, 1 skipped in 484.22 seconds =======

fchollet · 2016-06-27T02:11:00Z

Consider moving the two files to a docker folder to avoid cluttering the root folder.

Would you consider authoring a blog post on blog.keras.io to explain how to use this Docker image?

dosht · 2016-07-03T20:06:34Z

@fchollet I moved the docker into a subdirectory and will write a blog post draft soon.
@henry0312 I used conda because I found it simpler, but I will follow the docker-library/python and give it a try and added the .theanorc in a separate file.

tboquet · 2016-07-04T13:10:08Z

@dosht conda is used on travis for the tests, it would be consistent to use it in your container as well.

I will try to run the tests and compare the behaviour with some of my containers.

In my previous comment I mentioned Windows. Making the container work on either a Linux distro or Windows is not trivial using their new native application or the vm. I thought it would be possible using the REST api but it's not the best way to use the plugin and it will become messy.

Another point, it would be neat to have an automated build on Dockerhub 😃 .

lukovkin · 2016-07-04T16:48:47Z

Dockerfile

@@ -1,4 +1,4 @@
-FROM nvidia/cuda:7.5-cudnn4-devel
+FROM nvidia/cuda:7.5-cudnn5-devel


@dosht Have you tested if TensorFlow installed from the package (.whl) working with CuDNN 5? It's stated in TF docs that:

The GPU version (Linux only) works best with Cuda Toolkit 7.5 and cuDNN v4. other versions are supported (Cuda toolkit >= 7.0 and cuDNN 6.5(v2), 7.0(v3), v5) only when installing from sources.

(see https://www.tensorflow.org/versions/r0.9/get_started/os_setup.html#requirements).

lukovkin · 2016-07-04T17:05:44Z

@dosht
Cool idea regarding adding the docker container with Keras. We are using some for a while.
Some comments:

Have you considered using docker-compose instead of make?
Do you think if it makes sense to make TensorFlow GPU-backed installation from the source instead of .whl? It will allow to build from the latest master branch and use non-default CuDNN, but it fails to build automatically on the DockerHub - hits 2 hours time limit. TF has a Dockerfilr for it (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel-gpu), but Python should be changed to the 3.5.

dosht · 2016-07-17T09:21:10Z

@tboquet, I added this repo on DockerHub: https://hub.docker.com/r/dosht/keras/, but the namespace will change later of course.

@lukovkin, Starting from TensorFlow image is a good idea, but it's still using cudNN4. What do you think about that?
Docker compose is also a good idea and I'm trying it out to see how it can fit with us.

dosht · 2016-07-17T10:00:01Z

@fchollet, I added a readme file to describe how it works and this might change after the automated build on DockerHub and if we will use docker compose instead of make.

I will convert the readme into a blog post when this PR is done.

fchollet · 2016-07-18T00:04:52Z

@dosht sounds good. There seems to be multiple typos in the README, please have someone proofread it.

fchollet · 2016-07-19T21:24:08Z

@lukovkin, @tboquet, @henry0312 what do you guys think, are we all clear here?

henry0312 · 2016-07-20T01:15:20Z

docker/Dockerfile

+# Python
+ARG python_version=3.5.1
+RUN conda install -y python=${python_version} && \
+    pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0rc0-cp35-cp35m-linux_x86_64.whl && \


It's better to define the version of tensorflow before the line, i guess.

henry0312 · 2016-07-20T01:18:44Z

@dosht By the way, how do you install keras in your Dockerfile?
I can see theano, thensorflow and so on, but I don't know how you install.

henry0312 · 2016-07-20T01:24:40Z

docker/Dockerfile

+RUN conda install -y python=${python_version} && \
+    pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0rc0-cp35-cp35m-linux_x86_64.whl && \
+    pip install git+git://github.com/Theano/Theano.git && \
+    pip install ipdb pytest pytest-cov python-coveralls coverage==3.7.1 pytest-xdist pep8 pytest-pep8 && \


pip install git+git://github.com/fchollet/keras.git is needed?

I was thinking not to use install Keras, but to use current Keras code by append it to PYTHON_PATH
ENV PYTHONPATH='/src/:$PYTHONPATH'

henry0312 · 2016-07-20T01:27:09Z

For visualization,

pip install pydot_ng
sudo apt-get install graphviz

may be needed?

gw0 · 2016-07-20T09:00:40Z

I have developed and am using a set of Keras Docker containers for some time now. The philosophy is to make it simple to understand (no make files or nvidia-docker dependencies), tags for all possible versions and variants (Python 2/3, Theano/TensorFlow, CPU/GPU), and to have minimal images (for production use) and a full image with all batteries included (jupyter, ipython). All containers are managed only with the basic docker command, so that all Docker users are familiar how to use it. Check out:

For example for quick experiments and development of a model, one would mount his /home/user/project1 into the container and start a one-time use IPython shell:

# for python 2
$ docker run -it --rm -v=/home/user/project1:/srv gw000/keras-full ipython2
# for python 3
$ docker run -it --rm -v=/home/user/project1:/srv gw000/keras-full ipython3

Or start a Jupyter Notebook in background accessible on http://127.0.0.1:8888/:

$ docker run -d --name keras-full -p=6006:6006 -p=8888:8888 -v=/home/user/project1:/srv gw000/keras-full

For an already developed Keras model that is ready for production, one would specify the exact version, backend, and device to use (so that it is reproducible and known environment):

$ docker run -d --name project1 -v /srv/project1:/srv/project1 gw000/keras:1.0.4-py2-tf-cpu /srv/project1/run.py

For more complex projects a Dockerfile can simply extend this minimal image to add its dependencies (this is the Docker-way of customizing things).:

FROM gw000/keras:1.0.4-py3-th-gpu

RUN pip3 --no-cache-dir install pandas django
ADD project1/ /srv/project1/
RUN chmod +x /srv/project1/run.py

CMD ["/srv/project1/run.py"]

In Dockerfiles all packages are installed using pip (not conda), because Python users most commonly use this. There are also automatic builds set up on Dockerhub. I am prepared to merge and maintain this in the main Keras repository.

tboquet · 2016-07-21T12:17:04Z

@fchollet looks good to me!

@gw0, nice work! I like the way you handle Keras' versions.
The structure is nice but nvidia-docker seems to be the way to go. They are maintaining several versions of CUDA and cuDNN and their tool is stable now.
You could maybe open another PR after this one is merged so that we could discuss about how to integrate elements of your structure?

fchollet · 2016-07-23T22:25:02Z

You could maybe open another PR after this one is merged so that we could discuss about how to integrate elements of your structure?

Sounds good to me. I like the ability to specify versions to create a production-safe image.

fchollet · 2016-07-23T22:25:47Z

Will merge after latest comments have been addressed.

gw0 · 2016-07-25T09:36:37Z

The structure is nice but nvidia-docker seems to be the way to go.

Yes, the nvidia-docker tool with corresponding images is promoted by Nvidia, but:

it is a custom command, most Docker users do not have and are not familiar with (makes things less transparent, just like the Makefile in this PR)
this makes extending and using Dockerfiles more confusing (and also prevents it from being used in Docker-cloud providers, where you do not have access to the host system)
it is also focused only on Nvidia GPUs, not other GPU manifacturers (when OpenCL gets added to Theano/TensorFlow) and even less for CPU usage (most common case where people would just like to run a quick experiment)

In my opinion things should be as understandable and easy to use/extend as possible. Sorry, but using a Makefile and nvidia-docker just introduces unnecessary complications and limitations.

tboquet · 2016-07-25T13:19:48Z

I understand your concern about the makefile. It's possible to modify this after this PR is merged. It's also possible to use the rest api of the plugin with the docker command. Relying on custom made cuda images is also a limitation since someone as to do the support for other os too. It's also difficult to support because nvidia is iterating a lot on cuda and cudnn.

Theano is using nividia's repo and TensorFlow too.

You should definitely integrate your ideas in the repo and a discussion has to be opened on the technologies to use.

henry0312 · 2016-07-26T03:12:57Z

LGTM, though I don't know it's useful to use /src as WORKDIR

dosht · 2016-07-26T18:56:15Z

@henry0312 I was thinking to do that to make it easy to modify and experiment Keras code itself.

fchollet · 2016-07-27T01:29:53Z

LGTM

varoudis · 2016-08-10T14:25:45Z

I don't think cudnn5 is a good idea.
I think its only used when you build tf from source.

below is a simple tf example.

EDIT: @dosht I build docker with 7.5 cudnn4 from nvidia and its fine! better update Dockerfile

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:86:00.0
Total memory: 11.17GiB
Free memory: 3.83GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:346] Loaded cudnn library: 5005 but source was compiled against 4007.  If using a binary install, upgrade your cudnn library to match.  If building from sources, make sure the library loaded matches the version you specified during compile configuration.
F tensorflow/core/kernels/conv_ops.cc:457] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
Aborted (core dumped)

tboquet reviewed Jun 21, 2016
View reviewed changes

update nvidia-docker plugin

1166395

lambda-mostafa added 2 commits June 22, 2016 13:14

use .theanorc in Dockerfile

1078020

Add tensorflow to the docker image

0a0b6a2

tboquet reviewed Jun 22, 2016
View reviewed changes

update Docker image to cuDNN v5

fe29194

tboquet reviewed Jun 23, 2016
View reviewed changes

test fixes

24838db

move docker to sub directory

e352293

lukovkin reviewed Jul 4, 2016
View reviewed changes

Merge branch 'master' of github.com:fchollet/keras

372faeb

README for docker

dfc4183

Fix typos

4d9c4d4

henry0312 reviewed Jul 20, 2016
View reviewed changes

lambda-mostafa added 2 commits July 26, 2016 03:31

Add visualization to Dockerfile

e117a8d

Merge branch 'master' of github.com:dosht/keras

a970720

fchollet merged commit df84c69 into keras-team:master Jul 27, 2016

		@@ -1,4 +1,4 @@
		FROM nvidia/cuda:7.5-cudnn4-devel
		FROM nvidia/cuda:7.5-cudnn5-devel

Docker image for test and experiment Keras #3035

Docker image for test and experiment Keras #3035

Conversation

dosht commented Jun 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tboquet commented Jun 21, 2016 • edited Loading

fchollet commented Jun 21, 2016

dosht commented Jun 22, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fchollet commented Jun 23, 2016

tboquet commented Jun 23, 2016 • edited Loading

Choose a reason for hiding this comment

henry0312 commented Jun 24, 2016

dosht commented Jun 26, 2016

fchollet commented Jun 27, 2016

dosht commented Jul 3, 2016

tboquet commented Jul 4, 2016

Choose a reason for hiding this comment

lukovkin commented Jul 4, 2016

dosht commented Jul 17, 2016

dosht commented Jul 17, 2016 • edited Loading

fchollet commented Jul 18, 2016

fchollet commented Jul 19, 2016

Choose a reason for hiding this comment

henry0312 commented Jul 20, 2016

Choose a reason for hiding this comment

dosht Jul 26, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henry0312 commented Jul 20, 2016

gw0 commented Jul 20, 2016

tboquet commented Jul 21, 2016

fchollet commented Jul 23, 2016

fchollet commented Jul 23, 2016

gw0 commented Jul 25, 2016

tboquet commented Jul 25, 2016 • edited Loading

henry0312 commented Jul 26, 2016

dosht commented Jul 26, 2016

fchollet commented Jul 27, 2016

varoudis commented Aug 10, 2016 • edited Loading

tboquet commented Jun 21, 2016 •

edited

Loading

tboquet commented Jun 23, 2016 •

edited

Loading

dosht commented Jul 17, 2016 •

edited

Loading

dosht Jul 26, 2016 •

edited

Loading

tboquet commented Jul 25, 2016 •

edited

Loading

varoudis commented Aug 10, 2016 •

edited

Loading