Learning rate for Imagenet 

Hi Tim,

First, thank you for your code. 

I notice that you change the default learning rate for Imagenet in multi-GPU running by multiplying 0.1 with the number of GPUs. I am wondering did you actually use this to get the reported performance in the paper? Will this results in better performance only for sparse training or also dense performance.

Many thanks