Skip to content

Tanh instead of sigmoid

jpeg729 edited this page Jan 22, 2018 · 2 revisions

What happens if I replace sigmoid with tanh in RNN gates?

Motivation

sigmoid(x) is typically used for gates, but it is not symetric. Now sigmoid(+2) ~= 0.88 and the gate is open, but sigmoid(-2) ~= 0.12 and the gate is closed.

Now my input data is largely symetric so I wonder whether a more symetric gating function would speed up learning. The Strongly Typed RNN paper mentioned above uses tanh as an output gating function in some cases. With tanh we have tanh(+2) ~= 0.96, tanh(-2) ~= -0.96, and tanh(0) = 0. Nice and symetric.

Another option would be to use the TernaryTanh activation function that is like tanh but is flat around 0. f(x) = 1.5 * tanh(x) + 0.5 * tanh(−3 * x)

Method

I tried it out on the mackey glass series using a variety of RNN types. I opted for one layer of 50 units, keeping the number of units constant rather than seeking to keep the number of parameters constant.

The command line I used was python experiment.py --data mackey_glass --epochs 15 --layers ???_50 --sigmoid ???

15 epochs is sufficient in most cases for training to slow to a crawl. For better results I should let the tests run much longer and calculate an average of 5 runs. Nevertheless it is interesting to note that some models learn really quickly from the get go.

Preliminary results

Speedwise sigmoid seems fastest, tanh surprisingly yet consistently seems slightly slower, and TernaryTanh seems significantly slower.

Layer type Sigmoid loss Tanh loss TernaryTanh loss Comments
SRU ~0.004 ~0.0022 ~0.0022
TRNN ~0.008 ~0.0023 ~0.0021
LSTM ~0.008 flat flat
GRU ~0.006 ~0.0023 ~0.0023
RAN ~0.008 ~0.05 flat tanh gets to 0.002 after 20 more epochs
CFN ~0.008 ~0.04 flat tanh gets to 0.006 after 20 more epochs
MGU2 ~0.008 nan nan I have yet to understand why I get nan losses here

Results

When I have some time to spare I shall run more comprehensive tests.