It is suggested to make training data the first year and test data the second year. If I make sure the OrderDate is a datetime and split the data this way (1/1/2015 is divider), I get 74k rows training, 197k rows testing. Doesn’t seem right to split this way. Am I missing something?
I was actually thinking the same thing. I usually go 70:30 or 80:20 train:test ratio.
Yes, @Ross_Burns and @Gopi_Raga. you are right. This has been changed in the problem statement. We should try to ensure important features are covered appropriately in both the training and test dataset.
Please continue discussing the problem on the forum which will help others to benefit from the platform.
1 Like
Thank you for clarifying