Demand Forecasting Data Split

Ross_Burns · January 15, 2018, 3:31am

It is suggested to make training data the first year and test data the second year. If I make sure the OrderDate is a datetime and split the data this way (1/1/2015 is divider), I get 74k rows training, 197k rows testing. Doesn’t seem right to split this way. Am I missing something?

Gopi_Raga · January 15, 2018, 4:44am

I was actually thinking the same thing. I usually go 70:30 or 80:20 train:test ratio.

Pratik_Sonthalia · January 21, 2018, 1:17pm

Yes, @Ross_Burns and @Gopi_Raga. you are right. This has been changed in the problem statement. We should try to ensure important features are covered appropriately in both the training and test dataset.

Please continue discussing the problem on the forum which will help others to benefit from the platform.

Gopi_Raga · January 21, 2018, 1:20pm

Thank you for clarifying