Input Data Shape for RNN, LSTM

Hi All,

RNN and LSTM take input data shape as [ samples, time steps, features ].

Let’s take an example that I’ve 10 Samples, 2 Features each and I’m extracting features every second from 10 videos. Then my shape for 2 minute video would be [10, 120, 2]. I’ve two confusions here as below.

(1) I may have videos of different length. Suppose I’ve two videos first one is 2 minutes of length and second is 5 minute length. In this case input shape for first would be [1, 120, 2] and for second input shape would be [1, 300, 2].

(2) I may not be able to extract features at every second interval. eg. I’m detecting an object from the video and extracting it’s length and width as it’s features. In this situation I may not have an object at each second interval in the video or even if have it I may not be able to detect it 100% of the time correctly. In this situation my time steps will not be uniform. i.e. I may have a situation where I got features at seconds 1, 3, 5, 6, 7, 11 …etc. I mean in 2 minute video I may not get 120 time series and whatever I get may not be uniformly distant to each other.

I hope I’m clearly able to explain the situation and my doubt here.

How to handle these issues in stateful NN (RNN, LSTM)?
Are stateful NN able to handle both of these situations and we don’t need to do anything about them?

Please explain in details.

Regards
Manoj

Can someone from CloudXLab team help?

Yes, in RNN the first two dimensions (batch size and sequence) size are variable.

As long as there is a sequence it is okay.

Thanks a lot Sandeep Ji for clarifying my doubts. I had same understanding but just wanted someone to confirm it.

In my understanding Sequence does not necessarily needs to be a date value it can be a continuous sequence, 1,2,3,4,…