Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flux.LSTM() returns a sequence of states #1406

Closed
deveshjawla opened this issue Nov 26, 2020 · 9 comments
Closed

Flux.LSTM() returns a sequence of states #1406

deveshjawla opened this issue Nov 26, 2020 · 9 comments

Comments

@deveshjawla
Copy link

deveshjawla commented Nov 26, 2020

Hi guys,
I noticed that by default the LSTM layer returns a sequence of states instead of just the last value. Like in Keras one could set return_sequcnes=False to get only the last output, does flux also offer that?
Screenshot from 2020-11-25 23-55-37

@bhvieira
Copy link
Contributor

You can simply take the last value with array indexing. In your example: l(a)[end]

@deveshjawla
Copy link
Author

deveshjawla commented Nov 27, 2020

yes, I figured. But would be nice to have functionality built-in, since ideally one would like to feed the last value as input to let's say a Dense layer, but if the LSTM only returns a sequence then I can't make sense how a Dense layer is supposed to process it. I was looking into the source code, it doesn't seem to provide any kwargs such that one could set return_sequences=false. Unless I am obviously missing something there, I would probably implement this functionality myself.

@bhvieira
Copy link
Contributor

I didn't catch it at first, but in your example, what you actually have is a Batch. The return value is equal to a single timestep of a LSTM

@deveshjawla
Copy link
Author

If in that case, if we reshape to (features,timesteps,batches) and we feed it as one batch, it gives nonsensical results, as can be seen here:
Screenshot from 2020-11-27 09-02-13

@deveshjawla
Copy link
Author

Another minimal example shows that if we feed a batch of 1 sample, it still returns a sequence equal to the length of timesteps.
Screenshot from 2020-11-27 08-58-13

@deveshjawla
Copy link
Author

I believe, as an easy fix, one could also plug (x->x[:,end]) in the Chain so that the succeeding layers are fed only the last state.
Chain(LSTM(n,m), x -> x[:,end], Dense(m,p))

@bhvieira
Copy link
Contributor

Each time you call the LSTM is a timestep. You can use broadcasting, but not in the way you used.
You'd need to broadcast the LSTM forward pass over an array distributed in time, i.e. each array is a timestep.
So a = [randn(2,10) for _ in 1:20] would be an input, with batchsize 10 and 20 timesteps.
Then you'd call l.(a), broadcasting the LSTM forward pass over the array in time.
To keep the last element and apply a Dense layer to it, it would be simply d(l.(a)[end])

Another, very Julian, possibility would be mapslice, with time as the last dimension (good for memory management, IIRC), but I'm not sure how well is mapslice works with Zygore right now. @DhairyaLGandhi and @CarloLucibello could comment on that I think.

@deveshjawla
Copy link
Author

Seems quite confusing and unconventional. If I have 1 timeseries of 2 categorical values like rand([1,-1],100) and I make it onehot and reshape it to (2,1,100) then what will the input dimensions of an LSTM layer? If I use LSTM(2,1) then it gives nonsensical output like here:
Screenshot from 2020-11-28 16-09-36

@bhvieira
Copy link
Contributor

You shouldn't reshape it if you want to broadcast. You should keep each timestep as an array (Input size X Batch size), and then broadcast over that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants