Got a question about on end to end project. If we are given training, validation, and text set separately, how do evaluation? Do we combine training and validation sets, use the cross_val_score function and do cross-validation to evaluate the model?
Hi,
We split our dataset in three sets. So we train our model on training data and test our model on testing data. But, it will be a complex process to train our model completely and then test it on the testing data and then update the model according to it and retrain it because training models take much time. So for that, we use validation dataset. In easy words, we can understand it as test data while training the model. So we keep on testing or validating the model by validation data while the training is going on so that we can know our training is going in the right direction and if any improvement needed we can do it there and then and train the model further. So we don’t have to wait for the model to train completely and check it’s performance and then update our model according to it. It saves us a lot of time as we can do improvements in mid-training.
Another important reason to split the data into three sets is to prevent more of overfitting. Sometime it happens that our model starts overfitting the testing data too with the training data. So we use validation set because it’s more rare for a model to overfit all three datasets.