Very good question!.
For this you need to understand the Decisions tree algorithm and how each node splits the data space into pieces based on value of a feature and not on based on the distance between the values.
The thumb rule that you need to follow is that any algorithms that computes the distance or assumes the normality, for them we need to scale the Features.
The models where we use the below algos which measures the distances between the values.
- k-nearest neighbors --> Calculates the Euclidean Distance.
- PCA --> Calculates the variance between the features and skews towards the High Magnitude Features.
- Gradient Descents --> This is because “θ” will descend quickly on small ranges and slowly on large ranges also depends on distances measures.
But Tree based models are not distance based models they depends on the values/weights of the Features not the ranges of the features that is why features scaling may not have much effect on these algorithms.
similarly with “Naive Bayes”, “Linear Discriminant Analysis” and ofcourse “Random forest”.
and that is why we say that “Tree based algos are robust to outliers” you must have heard this statements during interviews.
So, your answer should be :-
Because decision trees divide items by current values, and not on based on the absolute magnitude.
It is beautifully explained in the tutorials here along with time and space complexity :