Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker image for test and experiment Keras #3035

Merged
merged 12 commits into from
Jul 27, 2016
Merged

Conversation

dosht
Copy link
Contributor

@dosht dosht commented Jun 21, 2016

To avoid issues in installing Theano or CUDA in any Linux distros, This PR is to automate the environment with a docker image for Keras and CUDA that can share the GPUs with the docker container by an Nvidia's script.

  • Docker image with CUDA support on ubuntu 14.04
  • nvidia-docker script to forward the GPU to the container
  • MakeFile to simplify docker commands for build, run, test, ..etc
  • Add useful tools like jupyter notebook, ipdb, sklearn for experiments

The MakeFile is optional to make docker commands easier, e.g.

$ make test  # runs all tests in the docker container
$ make ipython  # runs ipython shell with current Keras in PYTHONPATH
$ make notebook  # runs jupyter notebook with current Keras in PYTHONPATH

This can be more customizable by choosing a specific GPU and a directory for dataset, e.g.

$ make notebook GPU=0 DATA=/home/fchollet/datasets

I'm using this already for a while and it makes my life easier and will update the README as well if you liked this suggestion.

Thanks,

 - Docker image with CUDA support on ubuntu 14.04
 - nvidia-docker script to forward the GPU to the container
 - MakeFile to simplify docker commands for build, run, test, ..etc
 - Add useful tools like jupyter notebook, ipdb, sklearn for experiments
pip install theano ipdb pytest pytest-cov python-coveralls pytest-xdist pep8 pytest-pep8 && \
conda clean -yt

ENV THEANO_FLAGS='mode=FAST_RUN,device=gpu,nvcc.fastmath=True,floatX=float32'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not echo a .theanorc file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tboquet
Copy link
Contributor

tboquet commented Jun 21, 2016

Nice addition! Should you update the makefile so it can be used with the nvidia-docker plugin?
The script is deprecated and the plugin is stable right now.

@fchollet
Copy link
Collaborator

  • This seems for use exclusively with Theano. Any reason why? TensorFlow is gaining in popularity and will eventually be the default backend for Keras.
  • Who would take care of maintaining, updating this Docker image? I am not in a position where I can take responsibility for it.
  • Not everyone is familiar with Docker. I think it would be very important to have a tutorial somewhere on how to use this Docker image to get started using Keras (e.g. using a AWS GPU instance).

@dosht
Copy link
Contributor Author

dosht commented Jun 22, 2016

@fchollet, for the first point, I added Tensorflow as the default backend.
And it can switch to Theano by: make notebook BACKEND=theano
@tboquet, I removed the deprecated nvidia-docker script and added .theanorc

@@ -0,0 +1,52 @@
FROM nvidia/cuda:7.5-cudnn4-devel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuDNN v5 is available 😃 :
FROM nvidia/cuda:7.5-cudnn5-devel

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And quite a bit faster too!

@fchollet
Copy link
Collaborator

Can someone more familiar with Docker than me review this?

@tboquet
Copy link
Contributor

tboquet commented Jun 23, 2016

I can suggest some small changes but it would be good to have a third pass just to be sure everything is good! Another point, @dosht did you tried to launch the tests using Theano and TensorFlow in the container?

pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0rc0-cp35-cp35m-linux_x86_64.whl && \
pip install git+git://github.com/Theano/Theano.git && \
pip install ipdb pytest pytest-cov python-coveralls coverage==3.7.1 pytest-xdist pep8 pytest-pep8 && \
conda install Pillow scikit-learn notebook pandas matplotlib nose pyyaml six h5py && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to install the libhdf5-dev deb package for h5py to work properly?
Conda will link the compiled binaries but you will maybe have an error because the underlying c library is not there.

@henry0312
Copy link
Contributor

Why do you use conda to install python?
I think we can follow the way in docker-library/python, which is docker official Image for python.
(cf. https://github.com/docker-library/python/blob/master/3.5/Dockerfile)

@dosht
Copy link
Contributor Author

dosht commented Jun 26, 2016

@tboquet When running the tests, I got some failures. most of them are method not found.
When I switched to device = cpu and backend Theano, I got only this failure:

[gw0] linux -- Python 3.5.1 /opt/conda/bin/python
Slave 'gw0' crashed while running 'tests/keras/backend/test_backends.py::TestBackend::()::test_conv2d' 
======== 1 failed, 168 passed, 1 skipped in 484.22 seconds =======

@fchollet
Copy link
Collaborator

Consider moving the two files to a docker folder to avoid cluttering the root folder.

Would you consider authoring a blog post on blog.keras.io to explain how to use this Docker image?

@dosht
Copy link
Contributor Author

dosht commented Jul 3, 2016

@fchollet I moved the docker into a subdirectory and will write a blog post draft soon.
@henry0312 I used conda because I found it simpler, but I will follow the docker-library/python and give it a try and added the .theanorc in a separate file.

@tboquet
Copy link
Contributor

tboquet commented Jul 4, 2016

@dosht conda is used on travis for the tests, it would be consistent to use it in your container as well.

I will try to run the tests and compare the behaviour with some of my containers.

In my previous comment I mentioned Windows. Making the container work on either a Linux distro or Windows is not trivial using their new native application or the vm. I thought it would be possible using the REST api but it's not the best way to use the plugin and it will become messy.

Another point, it would be neat to have an automated build on Dockerhub 😃 .

@@ -1,4 +1,4 @@
FROM nvidia/cuda:7.5-cudnn4-devel
FROM nvidia/cuda:7.5-cudnn5-devel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dosht Have you tested if TensorFlow installed from the package (.whl) working with CuDNN 5? It's stated in TF docs that:

The GPU version (Linux only) works best with Cuda Toolkit 7.5 and cuDNN v4. other versions are supported (Cuda toolkit >= 7.0 and cuDNN 6.5(v2), 7.0(v3), v5) only when installing from sources.

(see https://www.tensorflow.org/versions/r0.9/get_started/os_setup.html#requirements).

@lukovkin
Copy link
Contributor

lukovkin commented Jul 4, 2016

@dosht
Cool idea regarding adding the docker container with Keras. We are using some for a while.
Some comments:

  • Have you considered using docker-compose instead of make?
  • Do you think if it makes sense to make TensorFlow GPU-backed installation from the source instead of .whl? It will allow to build from the latest master branch and use non-default CuDNN, but it fails to build automatically on the DockerHub - hits 2 hours time limit. TF has a Dockerfilr for it (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel-gpu), but Python should be changed to the 3.5.

@dosht
Copy link
Contributor Author

dosht commented Jul 17, 2016

@tboquet, I added this repo on DockerHub: https://hub.docker.com/r/dosht/keras/, but the namespace will change later of course.

@lukovkin, Starting from TensorFlow image is a good idea, but it's still using cudNN4. What do you think about that?
Docker compose is also a good idea and I'm trying it out to see how it can fit with us.

@dosht
Copy link
Contributor Author

dosht commented Jul 17, 2016

@fchollet, I added a readme file to describe how it works and this might change after the automated build on DockerHub and if we will use docker compose instead of make.

I will convert the readme into a blog post when this PR is done.

@fchollet
Copy link
Collaborator

@dosht sounds good. There seems to be multiple typos in the README, please have someone proofread it.

@fchollet
Copy link
Collaborator

@lukovkin, @tboquet, @henry0312 what do you guys think, are we all clear here?

# Python
ARG python_version=3.5.1
RUN conda install -y python=${python_version} && \
pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0rc0-cp35-cp35m-linux_x86_64.whl && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to define the version of tensorflow before the line, i guess.

@henry0312
Copy link
Contributor

@dosht By the way, how do you install keras in your Dockerfile?
I can see theano, thensorflow and so on, but I don't know how you install.

RUN conda install -y python=${python_version} && \
pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0rc0-cp35-cp35m-linux_x86_64.whl && \
pip install git+git://github.com/Theano/Theano.git && \
pip install ipdb pytest pytest-cov python-coveralls coverage==3.7.1 pytest-xdist pep8 pytest-pep8 && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pip install git+git://github.com/fchollet/keras.git is needed?

Copy link
Contributor Author

@dosht dosht Jul 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking not to use install Keras, but to use current Keras code by append it to PYTHON_PATH
ENV PYTHONPATH='/src/:$PYTHONPATH'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it.

@henry0312
Copy link
Contributor

For visualization,

pip install pydot_ng
sudo apt-get install graphviz

may be needed?

@gw0
Copy link
Contributor

gw0 commented Jul 20, 2016

I have developed and am using a set of Keras Docker containers for some time now. The philosophy is to make it simple to understand (no make files or nvidia-docker dependencies), tags for all possible versions and variants (Python 2/3, Theano/TensorFlow, CPU/GPU), and to have minimal images (for production use) and a full image with all batteries included (jupyter, ipython). All containers are managed only with the basic docker command, so that all Docker users are familiar how to use it. Check out:

For example for quick experiments and development of a model, one would mount his /home/user/project1 into the container and start a one-time use IPython shell:

# for python 2
$ docker run -it --rm -v=/home/user/project1:/srv gw000/keras-full ipython2
# for python 3
$ docker run -it --rm -v=/home/user/project1:/srv gw000/keras-full ipython3

Or start a Jupyter Notebook in background accessible on http://127.0.0.1:8888/:

$ docker run -d --name keras-full -p=6006:6006 -p=8888:8888 -v=/home/user/project1:/srv gw000/keras-full

For an already developed Keras model that is ready for production, one would specify the exact version, backend, and device to use (so that it is reproducible and known environment):

$ docker run -d --name project1 -v /srv/project1:/srv/project1 gw000/keras:1.0.4-py2-tf-cpu /srv/project1/run.py

For more complex projects a Dockerfile can simply extend this minimal image to add its dependencies (this is the Docker-way of customizing things).:

FROM gw000/keras:1.0.4-py3-th-gpu

RUN pip3 --no-cache-dir install pandas django
ADD project1/ /srv/project1/
RUN chmod +x /srv/project1/run.py

CMD ["/srv/project1/run.py"]

In Dockerfiles all packages are installed using pip (not conda), because Python users most commonly use this. There are also automatic builds set up on Dockerhub. I am prepared to merge and maintain this in the main Keras repository.

@tboquet
Copy link
Contributor

tboquet commented Jul 21, 2016

@fchollet looks good to me!

@gw0, nice work! I like the way you handle Keras' versions.
The structure is nice but nvidia-docker seems to be the way to go. They are maintaining several versions of CUDA and cuDNN and their tool is stable now.
You could maybe open another PR after this one is merged so that we could discuss about how to integrate elements of your structure?

@fchollet
Copy link
Collaborator

You could maybe open another PR after this one is merged so that we could discuss about how to integrate elements of your structure?

Sounds good to me. I like the ability to specify versions to create a production-safe image.

@fchollet
Copy link
Collaborator

Will merge after latest comments have been addressed.

@gw0
Copy link
Contributor

gw0 commented Jul 25, 2016

The structure is nice but nvidia-docker seems to be the way to go.

Yes, the nvidia-docker tool with corresponding images is promoted by Nvidia, but:

  • it is a custom command, most Docker users do not have and are not familiar with (makes things less transparent, just like the Makefile in this PR)
  • this makes extending and using Dockerfiles more confusing (and also prevents it from being used in Docker-cloud providers, where you do not have access to the host system)
  • it is also focused only on Nvidia GPUs, not other GPU manifacturers (when OpenCL gets added to Theano/TensorFlow) and even less for CPU usage (most common case where people would just like to run a quick experiment)

In my opinion things should be as understandable and easy to use/extend as possible. Sorry, but using a Makefile and nvidia-docker just introduces unnecessary complications and limitations.

@tboquet
Copy link
Contributor

tboquet commented Jul 25, 2016

I understand your concern about the makefile. It's possible to modify this after this PR is merged. It's also possible to use the rest api of the plugin with the docker command. Relying on custom made cuda images is also a limitation since someone as to do the support for other os too. It's also difficult to support because nvidia is iterating a lot on cuda and cudnn.

Theano is using nividia's repo and TensorFlow too.

You should definitely integrate your ideas in the repo and a discussion has to be opened on the technologies to use.

@henry0312
Copy link
Contributor

LGTM, though I don't know it's useful to use /src as WORKDIR

@dosht
Copy link
Contributor Author

dosht commented Jul 26, 2016

@henry0312 I was thinking to do that to make it easy to modify and experiment Keras code itself.

@fchollet
Copy link
Collaborator

LGTM

@fchollet fchollet merged commit df84c69 into keras-team:master Jul 27, 2016
@varoudis
Copy link

varoudis commented Aug 10, 2016

I don't think cudnn5 is a good idea.
I think its only used when you build tf from source.

below is a simple tf example.

EDIT: @dosht I build docker with 7.5 cudnn4 from nvidia and its fine! better update Dockerfile

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:86:00.0
Total memory: 11.17GiB
Free memory: 3.83GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:346] Loaded cudnn library: 5005 but source was compiled against 4007.  If using a binary install, upgrade your cudnn library to match.  If building from sources, make sure the library loaded matches the version you specified during compile configuration.
F tensorflow/core/kernels/conv_ops.cc:457] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
Aborted (core dumped)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants