Computing Stratum / Starata

Hi,
For strata sampling, we divided mean_monthly_income by 1.5 and then capped it at 5. So that we can create strata and then collect the sample from each strata for train and test set
On what factor, we selected 1.5

If you look at distribution, the values are spread between 0 and 15.

|840px;x526px;

By dividing the value by 1.5, we will bring the range into 10. It will be more manageable. Also, there you can divide it by 2 or 3 based on your comfort.

Here, we are dividing median income by 1.5 so that we will not have too many strata and each stratum will be large enough.

3 Likes