Session 5 | Regression Project, Build end to end Machine Learning Project

In the video and pdf documents, while generating test data set, stratified sampling and random sampling techniques are emphasized and later we compared the Income Category proportions on test sets with proportion of income categories in Overall Dataset.

Should we perform the same activity i.e. compare proportion of categories with Overall data on Train dataset rather than test dataset because train data is ingested in model building models?

You should usually compare the distribution on both test and train