Torch Data Augmentation #777

TimZaman · 2016-05-24T13:54:10Z

Data augmentation needs little introduction I recon. It counters overfitting and makes your model generalize better, yielding better validation accuracies; or alternatively, allows you to use smaller datasets with similar performance.

In the Zoo that's the internet, I see many implementations of different augmentations, of which few are proper and nicely portable. A part from Digits yielding a great UI; ease of use; and deep learning turn-key solution, I strongly feel we can expand to the functional side as well to make this a deep learning killer-app.

For torch, I have made an implementation during lua preprocessing from frontend to backend to enable Digits to do so. In #330 there was already an attempt for augmentation, which happened on the dataset-creation side; something I am strongly against. Resizing and cropping I would consider a transformation, while I consider augmenting the data in its container an augmentation. I think therefore it's fine to resize during dataset loading (and squashing/filling/etc), but I would probably leave it at that.

Anyway, I set up a more dynamic structure to pass around these options on the torch side; instead of adding a dozen of arguments to each function, I am just adding a table.

Implements the following (screenshot):

I have iterated through many augmentation types but these were the most useful. Almost done, now running elaborate tests.

Progress

The code is already functional, though see progress below.
See code, shoot!

Features

Testing

TimZaman · 2016-05-24T14:13:49Z

It this also the place to discuss the following:

The default 'Mean Subtraction' is one 'Image'. I have never really seen any model of mine perform better than 'None', only worse. I would suggest a default to None (moreover, it's hefty on the CPU).
test.lua expects square croplen images through an assert, whereas main.lua magically takes the minimum dimension of (croplen,x-dim,y-dim) : and might ignore the crop param altogether withouth the operator knowing about this
The input of the image (during and after torch preprocessing) is always in the range [0 255], I guess due to the backend (lmdb/hdf5) constraints. Shouldn't we by default scale this to [0 1]? I recollect the LeNet standard network have the :mul(1/255) in there as first layer; what's the general AI network convention here? We can also make normalization optional?

lukeyeager · 2016-05-24T16:41:47Z

The default 'Mean Subtraction' is one 'Image'. I have never really seen any model of mine perform better than 'None', only worse. I would suggest a default to None (moreover, it's hefty on the CPU).

Really? Interesting. I'm pretty sure LeNet won't work, at least.

And I'm also pretty sure that AlexNet and GoogLeNet both had some form of mean subtraction in their original implementations.

TimZaman · 2016-05-24T18:19:51Z

Really? Interesting. I'm pretty sure LeNet won't work, at least.
And I'm also pretty sure that AlexNet and GoogLeNet both had some form of mean subtraction in their original implementations.

What I have come to realize, is that that Deep Learning is so complex and there are so many bells and whistles, in many ways it's not an exact science anymore, but just scientific guesses and gut feelings. If you are using mean subtraction all the time, you will probably keep on using it for forever, without really knowing why. For example: same thing goes for MaxPooling. In most cases you might as well increase the stride without any loss It's something someone thought of, seemed to work well, or it didn't, then didn't bother much about it.

Same thing goes for preprocessing to a different colorspace: RGB to HSV, or RGB with local normalization, RGB with super fancy ZCA whitening. In essence, you are never actually adding any information (obviously), you can only take information out. What you can do is stress certain parts (like you do with enhancing edge-values with local contrast or whitening); but again, deep networks can figure that out without the help of our stresses. In my experience, RGB does fine.

Lets look at what mean subtraction does for MNIST:

My gut feeling says the mean-subtracted image probably only makes it worse (what does the mean of all these images mean anyway).

Lets investigate MNIST with default DIGITS settings:

Type	Max Accuracy %	Final Accuracy %
Image	97.92%	97.94%
Pixel	98.19%	98.15%
None	98.17%	98.16%

The above table is in agreement with what i said earlier: why would image subtraction help? We do this originally just to put the mean at 0 because we learned from statistics that's proper. But the deep neural network can handle mostly all ranges just fine, as long as the hyper parameters are balanced to the input.

Another example then, the mean image of a Kaggle challenge i found:

Subtracting that mean image ^ will yield really weird results.

Or the mean image of ImageNet:

This gray blob won't help your ImageNet accuracy one bit ^.

In their original implementations I bet many used mean subtraction, but that does not mean it yields the best results, which i think is more important.

See if you can reproduce the results from the above table. Why did you say it won't work for LeNet?

gheinrich · 2016-05-24T20:01:17Z

This can be debated at length but a reference paper on the subject is Efficient Backprop. Normalizing inputs and initializing weights carefully aims at sticking to the range where transfer functions are nicely non-linear and have non-zero gradients. You can always get around this after some learning (the network will learn how much bias to apply to the receptive fields) but sometimes it is almost impossible to learn efficiently. If the input to e.g. a sigmoid is very large then the gradient is almost zero and it is very difficult to learn.

In the example from the regression tutorial I've seen cases where the network diverges (loss->NaN) without mean subtraction. This has never happened with mean subtraction. The learning curve is smoother with mean image subtraction:

Compare with no mean subtraction:

The common practice is to normalize inputs so I guess that's what we need to do by default in DIGITS. On that subject you're making a good point about the pixel range:

The input of the image (during and after torch preprocessing) is always in the range [0 255]

I agree it would make more sense to be in the [0 1] range. We could very well do that in Torch but the thing is this isn't what Caffe is doing. Since DIGITS originally only supported Caffe I opted for the [0 255] range to get as close a user experience as possible. I fear this will make us suffer when we want to support 16-bit images!

Thanks a lot for the PR this is great!

lukeyeager · 2016-05-24T20:01:29Z

See if you can reproduce the results from the above table. Why did you say it won't work for LeNet?

Oh you're right! I was thinking of something else, related to something else you brought up:

The input of the image (during and after torch preprocessing) is always in the range [0 255], I guess due to the backend (lmdb/hdf5) constraints. Shouldn't we by default scale this to [0 1]? I recollect the LeNet standard network have the :mul(1/255) in there as first layer; what's the general AI network convention here? We can also make normalization optional?

AlexNet and GoogLeNet both work for [0-1] and [0-255] data, but LeNet only works on [0-1] data.

So nevermind about the mean subtraction - my bad!

gheinrich · 2016-05-24T20:57:01Z

Make UI data transforms only visible for the Torch framework (invisible for Caffe)

For this you could have a look at the data shuffle capability and how it's used in the templates to selectively enable the corresponding form field.

gheinrich · 2016-05-24T21:06:09Z

digits/templates/models/images/classification/new.html

@@ -356,16 +356,104 @@ <h4 style="display:inline-block;">Python Layers</h4>
        <div class="col-sm-4">
            <div class="well">
                <h4>Data Transformations</h4>
+                <div class="form-group{{mark_errors([form.use_mean])}}">


I feel we haven't done enough of this but you could perhaps move this section to a data_augmentation.html template and include it here and in digits/templates/models/images/generic/new.html

TimZaman · 2016-05-24T21:06:20Z

Come to think of it, indeed I think LeNet was meant for bitonal images, so
probably no default mean subtraction at all. Moreover, you don't need mean
subtraction from the dataset to yield a range that complies with the paper
you are refering to, we do not need to discuss that at length at all,
because you are right. But as just shown with MNIST, mean-subtraction by
the pixel-for-pixel mean of all images the dataset will never yield better
results; especially when the dataset does not contain hundreds of thousands
of images. Therefore i coined not using 'Image' as mean subtraction by
default.

On Tue, May 24, 2016 at 10:46 PM, Luke Yeager notifications@github.com
wrote:

See if you can reproduce the results from the above table. Why did you say
it won't work for LeNet?

Oh you're right! I was thinking of something else, related to something
else you brought up:

The input of the image (during and after torch preprocessing) is always in
the range [0 255], I guess due to the backend (lmdb/hdf5) constraints.
Shouldn't we by default scale this to [0 1]? I recollect the LeNet standard
network have the :mul(1/255) in there as first layer; what's the general AI
network convention here? We can also make normalization optional?

AlexNet and GoogLeNet both work for [0-1] and [0-255] data, but LeNet only
works on [0-1] data.

So nevermind about the mean subtraction - my bad!

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#777 (comment)

TimZaman · 2016-05-25T15:46:59Z

For this you could have a look at the data shuffle capability and how it's used in the templates to selectively enable the corresponding form field.

Thanks, that helped. I made it fairly dynamic now, and made a neat template. I took the liberty of putting the template in digits/templates/models/ because it can be used by a generic class and classification one, hope that's allright. Actually I'd rather put it in digits/templates/models/images/ but depends on the following:
In efforts to reduce redundancy we can merge the identical files:
generic/large_graph.html with classification/large_graph.html
generic/custom_network_explanation.html with classification/custom_network_explanation.html
and maybe put those in a more general folder.

pansk · 2016-05-25T20:58:37Z

I like this feature, that's basically more than what I've done in my matlab preprocessing step (except the multiscale extraction - I also extract regions from the same image downsampled 2x, 4x, etc)

About mean subtraction, in my experience with autoencoders it did help (at least in the first tests I took), so I ended up using it by default. I'll run a test on one of the latest models right now, and see what happens.

TimZaman · 2016-05-26T09:13:01Z

I like this feature

Perfect, would you like to help test when it's ready for review? Do you have any other augmentation steps that work well? For example, I have not included 'blurring' because it seemed to be relatively ineffective. How is multiscale extraction working for you?

About mean subtraction, in my experience with autoencoders it did help (at least in the first tests I took), so I ended up using it by default.

Sure it helps for the reasons accurately described above. But using 'Image' subtraction as opposed to 'Pixel' subtraction will probably not make an improvement I think.

gheinrich · 2016-05-26T09:35:11Z

using 'Image' subtraction as opposed to 'Pixel' subtraction will probably not make an improvement I think

I agree: image subtraction probably only helps for MNIST where DIGITS are nicely centred in the image but for realistic datasets pixel subtraction might make more sense. Besides, image subtraction is painful to work with for networks that accept various inputs sizes like FCNs.

pansk · 2016-05-26T20:44:06Z

@TimZaman sure, I'd gladly test!
Another augmentation I would like is a filtering pass (lowpass/highpass), but I think it's very application-specific, and it's probably not worth to generalise this step.

What about adding noise (with different scales)? E.g. augment a set adding images with 0%, 1%, 2%, 3%, 5%, 10% noise (list of noise % specified by the user).

Sure it helps for the reasons accurately described above. But using 'Image' subtraction as opposed to 'Pixel' subtraction will probably not make an improvement I think.

I ran a full training on a previous network, removing the mean subtraction: the resulting quality is unchanged. Thank you for pointing this out! 👍

TimZaman · 2016-05-26T23:15:55Z

High pass and low pass filtering.. Need to think about that one, how it
would help generalize. How do you use it now, DFT, or something simple?
Adding noise or blurring (blurring would already be low pass filtering i
guess) is straightforward and we can put it in, but often time i have seen
people reporting that it's almost the same as adding dropout: you're not
changing the image significantly enough; compare to a horizontal flip. A
horizontal flip for something like natural images almost doubles your
dataset because the image looks quite different. Whereas perturbing pixels
with noise is like dropping out some pixels.
Having said that, I might as well try. What are your experienced there?

On Thursday, 26 May 2016, Marco Foco notifications@github.com wrote:

@TimZaman https://github.com/TimZaman sure, I'd gladly test!
Another augmentation I would like is a filtering pass (lowpass/highpass),
but I think it's very application-specific, and it's probably not worth to
generalise this step.

What about adding noise (with different scales)? E.g. augment a set adding
images with 0%, 1%, 2%, 3%, 5%, 10% noise (list of noise % specified by the
user).

Sure it helps for the reasons accurately described above. But using
'Image' subtraction as opposed to 'Pixel' subtraction will probably not
make an improvement I think.

I ran a full training on a previous network, removing the mean
subtraction: the resulting quality is unchanged. Thank you for pointing
this out! 👍

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#777 (comment)

TimZaman · 2016-05-26T23:21:30Z

Stumbled upon some bugs in torch/image while testing scale and rotation. torch/image#169
Now running CIFAR10 benchmarks for many augmentation types.

pansk · 2016-05-27T12:39:04Z

I have tried both adding noise to original images (huge amount of training/test data is produced) and using dropout in place of noise.
I think adding a known amount of noise is more straightforward, since you can obtain the performance values for different noise scales directly from your test results.

TimZaman · 2016-05-27T12:43:41Z

I think adding a known amount of noise is more straightforward, since you can obtain the performance values for different noise scales directly from your test results.

Okay I'll see if I can add that. What kind of noise do you suggest, and what kind of underlying distribution? Matrix of a normal distribution then multiplying? Each channel seperate; or same for each? We're gonna need to make a ton of assumptions :).

gheinrich · 2016-05-27T12:52:52Z

digits/model/images/forms.py

+                ('rot90', '0, 90 or 270 degrees'),
+                ('rot180', '0 or 180 degrees'),
+                ('rotall', '0, 90, 180 or 270 degrees.'),
+                ],


are rot180 and rotall useful? Rotation by 180 degrees is the same as vertical flipping.

Almost. A vertical flip + horizontal flip = 180 rotation.
There is an interesting case to be made then you have fliplrud on, which could flip by 180 degrees (chance 1 in 4). Then if you have also turned on a rot* rotation that includes the 180 degree rotation, statistically, your chance of getting a 180 rotation is slightly higher than any another flip or rotation, because of their redundance.

TimZaman · 2016-05-27T15:43:46Z

Initial results. Trained on CIFAR10 with a great VGG network with the overfit we love to see. (https://github.com/szagoruyko/cifar.torch/blob/master/models/vgg_bn_drop.lua)

The results reveal the augmentation is working really nicely. Training speed per epoch does not seem to be impacted.

Scale

Rotation

Flipping

HSV

HSV was a lot of fun because of the Wikipedia-copied implementation that's in Torch (HSV isn't that well standardized i guess), but at least it modules something that resembles HSV.

I also tested for speed, and this seems negligible; but a CPU that's up to the task is required (i.e. we need to be able to use 400% CPU for the four loader threads at times - this might be more on bigger images, although bigger images often require/use bigger (slower) networks.

TimZaman · 2016-05-30T10:04:48Z

All Augmentations

HFlip, rotation 5, scale stddev 0.05, hsv (0.01,0.02,0.04)
yields a full 3 percent validation accuracy increase and a validation loss decrease from 0.59 to 0.35

gheinrich · 2016-05-30T11:16:11Z

That is totally awesome! That is a truly great feature! Can't wait to have that merged.

Do you think you can add tests for this (not necessarily to test that augmentation reduces overfit but at least to exercise the new code in the automatic tests). Do you need help there?

TimZaman · 2016-05-30T11:44:21Z

Do you think you can add tests for this (...). Do you need help there?

Yes I need some help indeed; a few pointers would be great. Which tests do you suggest and where do you suggest I put them? I have not looked into how this is done in this project at all.

pansk · 2016-05-30T12:43:20Z

I'd start with AWGN (Additive White Gaussian Noise) because it's easier to generate (just adding a 0-mean normal-distributed random variable will do), and you just need one parameter (SNR or standard deviation) to describe it (if you assume the signal's power at 0dB).
It is also a good approximation for thermal noise.

TimZaman · 2016-06-06T20:28:53Z

@pansk
Cropping happens at the very end.
I cannot reproduce your HSV error; can you tell me exactly what you did with what settings etc

Off-topic: I am in favor of using rescaling over cropping. A rescale augmentation is more methodologically correct in my opinion, mostly due to the fact that with cropping you generally have a problem of 'how do I validate my source images', which in Digits-torch is a center-crop.
If you stop using cropping, and use rescaling, you can still (when zoomed in or out) wiggle your image around the canvas: just as cropping (think about it..). Your mean 'rescale' will be 1:1, just like your validation pass image will be used entirely and 1:1.
But, for historic reasons everyone just uses cropping.

gheinrich · 2016-06-07T11:37:43Z

digits/templates/models/data_augmentation.html

@@ -0,0 +1,67 @@
+{# Copyright (c) 2014-2016, NVIDIA CORPORATION.  All rights reserved. #}


minor comment: you can make this 2016 only since this is a new file

pansk · 2016-06-08T17:49:12Z

I'm sorry for the delay, I'm verifying, because it doesn't seem to converge well, and I want to be sure it's not something related to my specific model.

TimZaman · 2016-06-08T17:51:24Z

Strange. What kind of model are you using, and dataset? I verified this to
work well on a few different datasets, although the hsv augmentation could
be slightly improved. Especially the horizontal flipping is very
straightforward and should help. rotation of a few degrees helps on most
datasets, and scaling too.

On Wed, Jun 8, 2016 at 7:49 PM, Marco Foco notifications@github.com wrote:

I'm sorry for the delay, I'm verifying, because it doesn't seem to
converge well, and I want to be sure it's not something related to my
specific model.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#777 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AHXSRN2xugocaVv-GxgrUFEndDHbXWMHks5qJwCigaJpZM4Ilgr1
.

pansk · 2016-06-09T11:31:00Z

I'm pretty sure it's my code, that's why I'm going to try with a simpler autoencoder model, but I've to build a specific dataset for that (the MNIST is grayscale only).

pansk · 2016-06-09T11:34:10Z

Just for reference, I was going to post this comment few days ago:

I'm training an autoencoder with some custom datasets with uniform images (like wood, stone walls, grass, and so on), so I don't really need to validate my input, but border effects might be annoying for me. On the other hand, I'd like to provide just a bunch of pictures (eventually, of different sizes) and let DIGITS generate its own set of images extracting random regions, rotating, scaling, adding noise, and so on. That's why I'd like to extract more than one region per source image.

My parameters were:

For the general case, I agree for cropping vs. scaling (actually, scaling gives you more robustness).

Anyway, are you sure digits-torch just uses center-cropping? If so, the baloon help for crop parameter is a bit misleading: "If specified, during a training a random square crop will be taken from the input image before using as input for the network."

gheinrich · 2016-06-09T11:40:15Z

are you sure digits-torch just uses center-cropping?

@TimZaman probably meant to say that we are doing centre cropping during validation. During training, we are doing random cropping.

TimZaman · 2016-06-09T11:55:16Z

are you sure digits-torch just uses center-cropping?

@TimZaman probably meant to say that we are doing centre cropping during validation. During training, we are doing random cropping.

Correct, I said verbatim

(...) with cropping you generally have a problem of 'how do I validate my source images', which in Digits-torch is a center-crop.

pansk · 2016-06-09T11:59:21Z

Sorry Tim, I didn't notice you were referring to validation-only.
Do any of you know the reason behind this choice?

TimZaman · 2016-06-09T12:03:02Z

It's just a valid choice of many valid choices. If you want to be more fancy you can do a few crops and take the average (or the max before normalizing) from those. There's really no 'correct' way of validating when you are cropping, since you have to cut parts of your source image off during validation because your model is smaller than your actual images. But in practice, performance wise - it doesn't matter much.

TimZaman · 2016-06-12T01:32:22Z

With the latest revision, and If Travis agrees with me, this PR is done I think.

gheinrich · 2016-06-13T09:34:55Z

This all looks good to me, thanks for the awesome PR! Can you squash your commits (possibly rebase too)?

Implemented in python and ui Implemented dynamic UI toggle and moved augmentation html to template Fixes uncovered during testing Added AWGN augmentation, reduced complexity, typos, syntax fixes Implemented a test to check at least all augmentations will run Added test initialization params Trivial languages fixes and a few bugs

TimZaman · 2016-06-13T11:37:29Z

Squashed & rebased.

gheinrich · 2016-06-13T12:58:00Z

Splendid!

philipperemy · 2016-07-17T01:37:44Z

Any updates since then?

lukeyeager · 2016-07-25T23:09:22Z

This looks good to me!

@gheinrich please merge unless you have more concerns.

TimZaman · 2016-07-25T23:17:59Z

The 'data hook' idea has also grown on me. I think it's a great (and pretty
straightforward) feature. But I do like the UI this PR brings for obvious
reasons. It's too bad Caffe doesnt have great augmentation layers in its
master, although there are some pro forks.
Also, there is a proper PR in Caffe for a confusion matrix layer. That
would mean Caffe could show it during training if it's added, as with
torch, it can be easily captured with a regex. Mkay I digress.

On Tuesday, 26 July 2016, Luke Yeager notifications@github.com wrote:

This looks good to me!

@gheinrich https://github.com/gheinrich please merge unless you have
more concerns.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#777 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AHXSRBiVbfbgfO6l8fF9mXTvyjUvnFAXks5qZUIqgaJpZM4Ilgr1
.

Torch Data Augmentation

lukeyeager added enhancement torch labels May 24, 2016

gheinrich reviewed May 24, 2016
View reviewed changes

gheinrich reviewed May 27, 2016
View reviewed changes

gheinrich reviewed Jun 7, 2016
View reviewed changes

gheinrich mentioned this pull request Jun 10, 2016

Image segmentation workflow #830

Merged

superphil0 mentioned this pull request Jun 22, 2016

Feature request: data augmentation torch #865

Closed

gheinrich merged commit 9ae9fa0 into NVIDIA:master Jul 26, 2016

gheinrich mentioned this pull request Sep 2, 2016

How to do data augmentation on the fly? #1034

Closed

SlipknotTN pushed a commit to cynnyx/DIGITS that referenced this pull request Mar 30, 2017

Merge pull request NVIDIA#777 from brainstorm-ai/torch_data_augmentation

4f4751e

Torch Data Augmentation

		@@ -0,0 +1,67 @@
		{# Copyright (c) 2014-2016, NVIDIA CORPORATION. All rights reserved. #}

Torch Data Augmentation #777

Torch Data Augmentation #777

Conversation

TimZaman commented May 24, 2016 • edited Loading

Progress

Features

Testing

TimZaman commented May 24, 2016 • edited Loading

lukeyeager commented May 24, 2016

TimZaman commented May 24, 2016

gheinrich commented May 24, 2016

lukeyeager commented May 24, 2016

gheinrich commented May 24, 2016

gheinrich May 24, 2016 • edited Loading

Choose a reason for hiding this comment

TimZaman commented May 24, 2016

TimZaman commented May 25, 2016 • edited Loading

pansk commented May 25, 2016

TimZaman commented May 26, 2016 • edited Loading

gheinrich commented May 26, 2016

pansk commented May 26, 2016

TimZaman commented May 26, 2016

TimZaman commented May 26, 2016

pansk commented May 27, 2016

TimZaman commented May 27, 2016 • edited Loading

gheinrich May 27, 2016

Choose a reason for hiding this comment

TimZaman May 27, 2016 • edited Loading

Choose a reason for hiding this comment

TimZaman commented May 27, 2016 • edited Loading

Scale

Rotation

Flipping

HSV

TimZaman commented May 30, 2016 • edited Loading

All Augmentations

gheinrich commented May 30, 2016

TimZaman commented May 30, 2016

pansk commented May 30, 2016

TimZaman commented Jun 6, 2016 • edited Loading

gheinrich Jun 7, 2016

Choose a reason for hiding this comment

pansk commented Jun 8, 2016

TimZaman commented Jun 8, 2016

pansk commented Jun 9, 2016

pansk commented Jun 9, 2016

gheinrich commented Jun 9, 2016

TimZaman commented Jun 9, 2016

pansk commented Jun 9, 2016

TimZaman commented Jun 9, 2016 • edited Loading

TimZaman commented Jun 12, 2016

gheinrich commented Jun 13, 2016

TimZaman commented Jun 13, 2016

gheinrich commented Jun 13, 2016

philipperemy commented Jul 17, 2016

lukeyeager commented Jul 25, 2016

TimZaman commented Jul 25, 2016

TimZaman commented May 24, 2016 •

edited

Loading

TimZaman commented May 24, 2016 •

edited

Loading

gheinrich May 24, 2016 •

edited

Loading

TimZaman commented May 25, 2016 •

edited

Loading

TimZaman commented May 26, 2016 •

edited

Loading

TimZaman commented May 27, 2016 •

edited

Loading

TimZaman May 27, 2016 •

edited

Loading

TimZaman commented May 27, 2016 •

edited

Loading

TimZaman commented May 30, 2016 •

edited

Loading

TimZaman commented Jun 6, 2016 •

edited

Loading

TimZaman commented Jun 9, 2016 •

edited

Loading