Flux.LSTM() returns a sequence of states #1406

deveshjawla · 2020-11-26T20:23:03Z

Hi guys,
I noticed that by default the LSTM layer returns a sequence of states instead of just the last value. Like in Keras one could set return_sequcnes=False to get only the last output, does flux also offer that?

bhvieira · 2020-11-27T01:24:07Z

You can simply take the last value with array indexing. In your example: l(a)[end]

deveshjawla · 2020-11-27T01:48:42Z

yes, I figured. But would be nice to have functionality built-in, since ideally one would like to feed the last value as input to let's say a Dense layer, but if the LSTM only returns a sequence then I can't make sense how a Dense layer is supposed to process it. I was looking into the source code, it doesn't seem to provide any kwargs such that one could set return_sequences=false. Unless I am obviously missing something there, I would probably implement this functionality myself.

bhvieira · 2020-11-27T02:04:18Z

I didn't catch it at first, but in your example, what you actually have is a Batch. The return value is equal to a single timestep of a LSTM

deveshjawla · 2020-11-27T08:04:36Z

If in that case, if we reshape to (features,timesteps,batches) and we feed it as one batch, it gives nonsensical results, as can be seen here:

deveshjawla · 2020-11-27T08:06:32Z

Another minimal example shows that if we feed a batch of 1 sample, it still returns a sequence equal to the length of timesteps.

deveshjawla · 2020-11-27T13:26:24Z

I believe, as an easy fix, one could also plug (x->x[:,end]) in the Chain so that the succeeding layers are fed only the last state.
Chain(LSTM(n,m), x -> x[:,end], Dense(m,p))

bhvieira · 2020-11-27T14:36:45Z

Each time you call the LSTM is a timestep. You can use broadcasting, but not in the way you used.
You'd need to broadcast the LSTM forward pass over an array distributed in time, i.e. each array is a timestep.
So a = [randn(2,10) for _ in 1:20] would be an input, with batchsize 10 and 20 timesteps.
Then you'd call l.(a), broadcasting the LSTM forward pass over the array in time.
To keep the last element and apply a Dense layer to it, it would be simply d(l.(a)[end])

Another, very Julian, possibility would be mapslice, with time as the last dimension (good for memory management, IIRC), but I'm not sure how well is mapslice works with Zygore right now. @DhairyaLGandhi and @CarloLucibello could comment on that I think.

deveshjawla · 2020-11-28T15:14:51Z

Seems quite confusing and unconventional. If I have 1 timeseries of 2 categorical values like rand([1,-1],100) and I make it onehot and reshape it to (2,1,100) then what will the input dimensions of an LSTM layer? If I use LSTM(2,1) then it gives nonsensical output like here:

bhvieira · 2020-11-28T15:19:01Z

You shouldn't reshape it if you want to broadcast. You should keep each timestep as an array (Input size X Batch size), and then broadcast over that.

deveshjawla closed this as completed Nov 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flux.LSTM() returns a sequence of states #1406

Flux.LSTM() returns a sequence of states #1406

deveshjawla commented Nov 26, 2020 •

edited

Loading

bhvieira commented Nov 27, 2020

deveshjawla commented Nov 27, 2020 •

edited

Loading

bhvieira commented Nov 27, 2020

deveshjawla commented Nov 27, 2020

deveshjawla commented Nov 27, 2020

deveshjawla commented Nov 27, 2020

bhvieira commented Nov 27, 2020

deveshjawla commented Nov 28, 2020

bhvieira commented Nov 28, 2020

Flux.LSTM() returns a sequence of states #1406

Flux.LSTM() returns a sequence of states #1406

Comments

deveshjawla commented Nov 26, 2020 • edited Loading

bhvieira commented Nov 27, 2020

deveshjawla commented Nov 27, 2020 • edited Loading

bhvieira commented Nov 27, 2020

deveshjawla commented Nov 27, 2020

deveshjawla commented Nov 27, 2020

deveshjawla commented Nov 27, 2020

bhvieira commented Nov 27, 2020

deveshjawla commented Nov 28, 2020

bhvieira commented Nov 28, 2020

deveshjawla commented Nov 26, 2020 •

edited

Loading

deveshjawla commented Nov 27, 2020 •

edited

Loading