Open
Description
Hi Tim,
First, thank you for your code.
I notice that you change the default learning rate for Imagenet in multi-GPU running by multiplying 0.1 with the number of GPUs. I am wondering did you actually use this to get the reported performance in the paper? Will this results in better performance only for sparse training or also dense performance.
Many thanks