-
Notifications
You must be signed in to change notification settings - Fork 4
Tanh instead of sigmoid
sigmoid(x)
is typically used for gates, but it is not symetric. Now sigmoid(+2) ~= 0.88
and the gate is open, but sigmoid(-2) ~= 0.12
and the gate is closed.
Now my input data is largely symetric so I wonder whether a more symetric gating function would speed up learning. The Strongly Typed RNN paper mentioned above uses tanh as an output gating function in some cases. With tanh we have tanh(+2) ~= 0.96
, tanh(-2) ~= -0.96
, and tanh(0) = 0
. Nice and symetric.
Another option would be to use the TernaryTanh activation function that is like tanh but is flat around 0. f(x) = 1.5 * tanh(x) + 0.5 * tanh(−3 * x)
I tried it out on the mackey glass series using a variety of RNN types. I opted for one layer of 50 units, keeping the number of units constant rather than seeking to keep the number of parameters constant.
The command line I used was python experiment.py --data mackey_glass --epochs 15 --layers ???_50 --sigmoid ???
15 epochs is sufficient in most cases for training to slow to a crawl. For better results I should let the tests run much longer and calculate an average of 5 runs. Nevertheless it is interesting to note that some models learn really quickly from the get go.
Speedwise sigmoid seems fastest, tanh surprisingly yet consistently seems slightly slower, and TernaryTanh seems significantly slower.
Layer type | Sigmoid loss | Tanh loss | TernaryTanh loss | Comments |
---|---|---|---|---|
SRU | ~0.004 | ~0.0022 | ~0.0022 | |
TRNN | ~0.008 | ~0.0023 | ~0.0021 | |
LSTM | ~0.008 | flat | flat | |
GRU | ~0.006 | ~0.0023 | ~0.0023 | |
RAN | ~0.008 | ~0.05 | flat | tanh gets to 0.002 after 20 more epochs |
CFN | ~0.008 | ~0.04 | flat | tanh gets to 0.006 after 20 more epochs |
MGU2 | ~0.008 | nan | nan | I have yet to understand why I get nan losses here |
When I have some time to spare I shall run more comprehensive tests.