AdaBoost Ensemble ML Algorithm - Overfitting

ss7dec · May 22, 2021, 7:19pm

Hi CloudX Lab Team,

Another clarification is required wrt Overfitting of Adaboost Ensemble ML Algorithm.

In case, if there is overfitting, we need to adopt the Regularization method.

Now based on this Regularization, we need to adopt either one of the 2 strategies viz.:
a) To reduce the number of Estimators (i.e. the number of Trees in the Forest). In other words, we need to reduce the number of Variables from the Sampling Dataset.
OR—
b) To to regularize the Base Estimator.

Now this Base Estimator contains the Base Decision Node.
Now this Base Decision Node lies close to the Root of the Tree.
This Base Decision Node comprises of Important Features that lie close to the root of the tree (as stated in Slide 71).
While growing the tree(s) in the forest, in order to split the nodes either of the two methodologies/strategies can be adopted viz.:
a) Random subset of Features/Predictors/Variables OR
b) Random threshold values for each feature

So technically is this the appropriate way of describing the Overfitting of AdaBoost Ensemble ML Algorithm & how to resolve it???

Kindly correct me (each & every step), if my though process is in a wrong direction.

Sincerely looking forward to your valuable inputs.

rajtilakb · May 24, 2021, 3:45am

AdaBoost is quite robust to overfitting. Having said that, it can be prone to overfitting in case of noisy datasets, or datasets with higher dimensions. In case of noisy datasets, you can try the regularized version of AdaBoost, which is AdaBoostReg. You can also think about using heuristic methods such as validation sets or k-fold cross-validation to set the stopping parameter (or other parameters in the different variants) as you would for any other classifier. There are a number of papers available related to this subject which are a very good read, you can go through them if you like. I have provided the link to 2 of them below:

https://easychair.org/publications/preprint/rFCW

ss7dec · May 24, 2021, 6:47pm

Indeed a valuable reference…thanks Rajtilak for sharing it!!!