Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM: Many to many sequence prediction with different sequence length #6063

Closed
Ironbell opened this issue Mar 30, 2017 · 17 comments
Closed

LSTM: Many to many sequence prediction with different sequence length #6063

Ironbell opened this issue Mar 30, 2017 · 17 comments

Comments

@Ironbell
Copy link

First of all, I know that there are already issues open regarding that topic, but their solutions don't solve my problem and I'll explain why.

The problem is to predict the next n_post steps of a sequence given n_pre steps of it, with n_pre < n_post. I've built a toy example using a simple sine wave to illustrate it. The many to one forecast (n_pre=50, n_post=1) works perfectly:

model = Sequential()  
model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=False))  
model.add(Dense(1))
model.add(Activation('linear'))   
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])  

plot_mto_0

Also, the many to many forecast with (n_pre=50, n_post=50) gives a near perfect fit:

model = Sequential()  
model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=True))  
model.add(TimeDistributed(Dense(1)))
model.add(Activation('linear'))   
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])  

plot_0

But now assume we have data that looks like this:
dataX or input: (nb_samples, nb_timesteps, nb_features) -> (1000, 50, 1)
dataY or output: (nb_samples, nb_timesteps, nb_features) -> (1000, 10, 1)

The solution given in #2403 is to build the model like this:

model = Sequential()  
model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=False))  
model.add(RepeatVector(10))
model.add(TimeDistributed(Dense(1)))
model.add(Activation('linear'))   
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])  

Well, it compiles and trains, but the prediction is really bad:

plot_mtm_0

My explanation to this is: The network has only one piece of information (no return_sequences) at the end of the LSTM layer, repeats this output_dimension-times and then tries to fit. The best guess it can give is the average of all the points to predict as it doesn't know whether it is currently going down or up in the sinus wave, it loses this information with return_sequences=False!

So, my final question is: How can I keep this information and let the LSTM layer return a part of its sequence? Because I don't want to fit it to n_pre=50 time steps but only to 10 because in my problem, the points are not so nicely correlated as in the sine wave of course. Currently I just give 50 points and then crop the output (after training) to 10 but it still tries to fit to all 50, which distorts the result.

Any help would be greatly appreciated!

@javiercorrea
Copy link

I think you need to do something like this:

model = Sequential()  
model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=False))  
model.add(RepeatVector(10))
model.add(LSTM(output_dim=hidden_neurons, return_sequences=True))  
model.add(TimeDistributed(Dense(1)))
model.add(Activation('linear'))   
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])  

otherwise you are just repeating the the last Dense layer and getting a constant value.

@Ironbell
Copy link
Author

Thank you very much. I tried your suggestion and the predictions now look like this:

plot_mtm_double_0

The number of epochs and hidden neurons is the same as in the other testcases, but the prediction is worse for 10 steps compared to 50. Is there a (simple) explanation why it gets worse with more layers? Or does it just need to train longer because it has more parameters to adjust?

@javiercorrea
Copy link

I would say that the modeling assumptions of both approaches are different. In the later model, it is assumed that the model sees the complete input sequence (first 50 steps), somehow creates a summary and uses this summary to generate a new signal (last 10 steps).

On the other hand, your initial model estimated the last 50 steps while reading the input signal, no summarisation of the original signal was used.

@Ironbell
Copy link
Author

That's a perfect and clear answer, thank you very much.

@ghost
Copy link

ghost commented Jun 30, 2017

HI, I have been studying how to use the many to many model of lstm to predict time series data, and now I have the same problem that you once had, could you share your demo py files about predicting a simple sine wave to me ? I mean i want to learn your code and replace your data with mine just to have a try. it will be very nice of you if you could do me a favor! thanks first !
my email : zhangping16@mails.ucas.ac.cn

thank you !

@Ironbell
Copy link
Author

Here you go!
test_sine.txt

@bestazad
Copy link

bestazad commented Mar 1, 2019

HI there!
It seems in new versions of Keras the input_dim and output_dim arguments are replaced with input_shape() function. So may you edit this parts of code to match to the new version:


model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=False))  
model.add(LSTM(output_dim=hidden_neurons, return_sequences=True))

I also have another question. what is the reason of using model.add(Activation('linear')) ?
Thanks in advanced!

@pusj
Copy link

pusj commented Mar 5, 2019

Hi @bestazad ,

You can obtain the same result using input_dim() or input_shape(), to my knowledge both these two "alternatives" has been used for quite some time

https://stackoverflow.com/questions/53106111/in-keras-when-should-i-use-input-shape-instead-of-input-dim

The reasoning the why model.add(Activation('linear')) is used is most likely because this is (only) tentative example, other activation functions can probably give similar results here.

@gustavz
Copy link

gustavz commented Jul 8, 2019

How would you train the model on variable input length?

@pusj
Copy link

pusj commented Jul 8, 2019

Hi @gustavz

Two options/suggestions:

  1. Padding https://machinelearningmastery.com/data-preparation-variable-length-input-sequences-sequence-prediction/
  2. Sequence bucketing https://arxiv.org/ftp/arxiv/papers/1708/1708.05604.pdf

Padding looks easier but I would guess that this method also decreases the usefulness of the model.

If you (or anybode else) could help me with a good explanation of what RepeatVector() does here I would be happy, this is the best reference https://stackoverflow.com/questions/51749404/how-to-connect-lstm-layers-in-keras-repeatvector-or-return-sequence-true , however, this is for a Encode/Decoder network and I'm not sure if this is the same for a LSTM network. E.g., does RepeatVector() that the original input (from the very first layer) or does RepeatVector() work with the inputs/outputs between hidden layers?

@gustavz
Copy link

gustavz commented Jul 9, 2019

what is the difference between:

model = Sequential()  
model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=False))  
model.add(RepeatVector(10))
model.add(LSTM(output_dim=hidden_neurons, return_sequences=True))  
model.add(TimeDistributed(Dense(1)))
model.add(Activation('linear')) 

and

model = Sequential()  
model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=True))  
model.add(LSTM(output_dim=hidden_neurons, return_sequences=False))  
model.add(Dense(10))

maybe best explained with this image

@pusj
Copy link

pusj commented Jul 9, 2019

Thanks for this; which is which? I've added some numbers to your image to better reference the variants. I assume that the code that contains RepeatVector() is represented by variant 4 and that the code that does not contains RepeatVector() is represented by variant 5. Is this correct?

image

Thanks! :-)

@gustavz
Copy link

gustavz commented Jul 9, 2019

Option 1 is an Encoder-Decoder, Option 2 is a Vanilla LSTM

@byamao1
Copy link

byamao1 commented Feb 6, 2020

Option 1 is part 4 of the image?

@0xsimulacra
Copy link

Thanks for this; which is which? I've added some numbers to your image to better reference the variants. I assume that the code that contains RepeatVector() is represented by variant 4 and that the code that does not contains RepeatVector() is represented by variant 5. Is this correct?

image

Thanks! :-)

that the code that does not contains RepeatVector() is a many-to-one architecture (variant 3). To have a many-to-many architecture you have to mdofy the code that does not contain RepeatVector() to have return_sequences=True in both LSTM layer and not only only the first layer.

@akshat-suwalka
Copy link

akshat-suwalka commented Jul 17, 2021

Can anybody will help me in how to write the code for 5th case of the above image?
Specifically in keras.

@GODJOSE27
Copy link

Anyone to help me with forecasting Time Series using CNN-LSTM, I tried but it doesn't attach the forecasting part to the testing Data.
You can reach me via godjose70@yahoo.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants