-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Conversation
It this also the place to discuss the following:
|
Really? Interesting. I'm pretty sure LeNet won't work, at least. And I'm also pretty sure that AlexNet and GoogLeNet both had some form of mean subtraction in their original implementations. |
This can be debated at length but a reference paper on the subject is Efficient Backprop. Normalizing inputs and initializing weights carefully aims at sticking to the range where transfer functions are nicely non-linear and have non-zero gradients. You can always get around this after some learning (the network will learn how much bias to apply to the receptive fields) but sometimes it is almost impossible to learn efficiently. If the input to e.g. a sigmoid is very large then the gradient is almost zero and it is very difficult to learn. In the example from the regression tutorial I've seen cases where the network diverges (loss->NaN) without mean subtraction. This has never happened with mean subtraction. The learning curve is smoother with mean image subtraction: The common practice is to normalize inputs so I guess that's what we need to do by default in DIGITS. On that subject you're making a good point about the pixel range:
I agree it would make more sense to be in the [0 1] range. We could very well do that in Torch but the thing is this isn't what Caffe is doing. Since DIGITS originally only supported Caffe I opted for the [0 255] range to get as close a user experience as possible. I fear this will make us suffer when we want to support 16-bit images! Thanks a lot for the PR this is great! |
Oh you're right! I was thinking of something else, related to something else you brought up:
AlexNet and GoogLeNet both work for [0-1] and [0-255] data, but LeNet only works on [0-1] data. So nevermind about the mean subtraction - my bad! |
For this you could have a look at the data shuffle capability and how it's used in the templates to selectively enable the corresponding form field. |
@@ -356,16 +356,104 @@ <h4 style="display:inline-block;">Python Layers</h4> | |||
<div class="col-sm-4"> | |||
<div class="well"> | |||
<h4>Data Transformations</h4> | |||
<div class="form-group{{mark_errors([form.use_mean])}}"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we haven't done enough of this but you could perhaps move this section to a data_augmentation.html
template and include it here and in digits/templates/models/images/generic/new.html
Come to think of it, indeed I think LeNet was meant for bitonal images, so On Tue, May 24, 2016 at 10:46 PM, Luke Yeager notifications@github.com
|
Thanks, that helped. I made it fairly dynamic now, and made a neat template. I took the liberty of putting the template in |
I like this feature, that's basically more than what I've done in my matlab preprocessing step (except the multiscale extraction - I also extract regions from the same image downsampled 2x, 4x, etc) About mean subtraction, in my experience with autoencoders it did help (at least in the first tests I took), so I ended up using it by default. I'll run a test on one of the latest models right now, and see what happens. |
Perfect, would you like to help test when it's ready for review? Do you have any other augmentation steps that work well? For example, I have not included 'blurring' because it seemed to be relatively ineffective. How is multiscale extraction working for you?
Sure it helps for the reasons accurately described above. But using 'Image' subtraction as opposed to 'Pixel' subtraction will probably not make an improvement I think. |
I agree: image subtraction probably only helps for MNIST where DIGITS are nicely centred in the image but for realistic datasets pixel subtraction might make more sense. Besides, image subtraction is painful to work with for networks that accept various inputs sizes like FCNs. |
@TimZaman sure, I'd gladly test! What about adding noise (with different scales)? E.g. augment a set adding images with 0%, 1%, 2%, 3%, 5%, 10% noise (list of noise % specified by the user).
I ran a full training on a previous network, removing the mean subtraction: the resulting quality is unchanged. Thank you for pointing this out! 👍 |
High pass and low pass filtering.. Need to think about that one, how it On Thursday, 26 May 2016, Marco Foco notifications@github.com wrote:
|
Stumbled upon some bugs in torch/image while testing scale and rotation. torch/image#169 |
I have tried both adding noise to original images (huge amount of training/test data is produced) and using dropout in place of noise. |
Okay I'll see if I can add that. What kind of noise do you suggest, and what kind of underlying distribution? Matrix of a normal distribution then multiplying? Each channel seperate; or same for each? We're gonna need to make a ton of assumptions :). |
('rot90', '0, 90 or 270 degrees'), | ||
('rot180', '0 or 180 degrees'), | ||
('rotall', '0, 90, 180 or 270 degrees.'), | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are rot180
and rotall
useful? Rotation by 180 degrees is the same as vertical flipping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost. A vertical flip + horizontal flip = 180 rotation.
There is an interesting case to be made then you have fliplrud
on, which could flip by 180 degrees (chance 1 in 4). Then if you have also turned on a rot*
rotation that includes the 180 degree rotation, statistically, your chance of getting a 180 rotation is slightly higher than any another flip or rotation, because of their redundance.
Initial results. Trained on CIFAR10 with a great VGG network with the overfit we love to see. (https://github.com/szagoruyko/cifar.torch/blob/master/models/vgg_bn_drop.lua) The results reveal the augmentation is working really nicely. Training speed per epoch does not seem to be impacted. ScaleRotationFlippingHSVHSV was a lot of fun because of the Wikipedia-copied implementation that's in Torch (HSV isn't that well standardized i guess), but at least it modules something that resembles HSV. I also tested for speed, and this seems negligible; but a CPU that's up to the task is required (i.e. we need to be able to use 400% CPU for the four loader threads at times - this might be more on bigger images, although bigger images often require/use bigger (slower) networks. |
That is totally awesome! That is a truly great feature! Can't wait to have that merged. Do you think you can add tests for this (not necessarily to test that augmentation reduces overfit but at least to exercise the new code in the automatic tests). Do you need help there? |
Yes I need some help indeed; a few pointers would be great. Which tests do you suggest and where do you suggest I put them? I have not looked into how this is done in this project at all. |
I'd start with AWGN (Additive White Gaussian Noise) because it's easier to generate (just adding a 0-mean normal-distributed random variable will do), and you just need one parameter (SNR or standard deviation) to describe it (if you assume the signal's power at 0dB). |
@pansk Off-topic: I am in favor of using rescaling over cropping. A rescale augmentation is more methodologically correct in my opinion, mostly due to the fact that with cropping you generally have a problem of 'how do I validate my source images', which in Digits-torch is a center-crop. |
@@ -0,0 +1,67 @@ | |||
{# Copyright (c) 2014-2016, NVIDIA CORPORATION. All rights reserved. #} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment: you can make this 2016
only since this is a new file
I'm sorry for the delay, I'm verifying, because it doesn't seem to converge well, and I want to be sure it's not something related to my specific model. |
Strange. What kind of model are you using, and dataset? I verified this to On Wed, Jun 8, 2016 at 7:49 PM, Marco Foco notifications@github.com wrote:
|
I'm pretty sure it's my code, that's why I'm going to try with a simpler autoencoder model, but I've to build a specific dataset for that (the MNIST is grayscale only). |
Just for reference, I was going to post this comment few days ago: I'm training an autoencoder with some custom datasets with uniform images (like wood, stone walls, grass, and so on), so I don't really need to validate my input, but border effects might be annoying for me. On the other hand, I'd like to provide just a bunch of pictures (eventually, of different sizes) and let DIGITS generate its own set of images extracting random regions, rotating, scaling, adding noise, and so on. That's why I'd like to extract more than one region per source image. My parameters were: For the general case, I agree for cropping vs. scaling (actually, scaling gives you more robustness). Anyway, are you sure digits-torch just uses center-cropping? If so, the baloon help for crop parameter is a bit misleading: "If specified, during a training a random square crop will be taken from the input image before using as input for the network." |
@TimZaman probably meant to say that we are doing centre cropping during validation. During training, we are doing random cropping. |
Correct, I said verbatim
|
Sorry Tim, I didn't notice you were referring to validation-only. |
It's just a valid choice of many valid choices. If you want to be more fancy you can do a few crops and take the average (or the max before normalizing) from those. There's really no 'correct' way of validating when you are cropping, since you have to cut parts of your source image off during validation because your model is smaller than your actual images. But in practice, performance wise - it doesn't matter much. |
With the latest revision, and If Travis agrees with me, this PR is done I think. |
This all looks good to me, thanks for the awesome PR! Can you squash your commits (possibly rebase too)? |
Implemented in python and ui Implemented dynamic UI toggle and moved augmentation html to template Fixes uncovered during testing Added AWGN augmentation, reduced complexity, typos, syntax fixes Implemented a test to check at least all augmentations will run Added test initialization params Trivial languages fixes and a few bugs
Squashed & rebased. |
Splendid! |
Any updates since then? |
This looks good to me! @gheinrich please merge unless you have more concerns. |
The 'data hook' idea has also grown on me. I think it's a great (and pretty On Tuesday, 26 July 2016, Luke Yeager notifications@github.com wrote:
|
Torch Data Augmentation
Data augmentation needs little introduction I recon. It counters overfitting and makes your model generalize better, yielding better validation accuracies; or alternatively, allows you to use smaller datasets with similar performance.
In the Zoo that's the internet, I see many implementations of different augmentations, of which few are proper and nicely portable. A part from Digits yielding a great UI; ease of use; and deep learning turn-key solution, I strongly feel we can expand to the functional side as well to make this a deep learning killer-app.
For torch, I have made an implementation during lua preprocessing from frontend to backend to enable Digits to do so. In #330 there was already an attempt for augmentation, which happened on the dataset-creation side; something I am strongly against. Resizing and cropping I would consider a transformation, while I consider augmenting the data in its container an augmentation. I think therefore it's fine to resize during dataset loading (and squashing/filling/etc), but I would probably leave it at that.
Anyway, I set up a more dynamic structure to pass around these options on the torch side; instead of adding a dozen of arguments to each function, I am just adding a table.
Implements the following (screenshot):
I have iterated through many augmentation types but these were the most useful. Almost done, now running elaborate tests.
Progress
The code is already functional, though see progress below.
See code, shoot!
Features
Implement UI option for normalization (scales the [0 255] to [0 1])data_augmentation.html
Testing