Doubt Clarification - Regarding impact of range of features values on training

[Question By User]

I have a query regarding ML training working and the range of input features values.

Suppose i have an input with 4 features. 2 of them are in the range [-1,1], rest are out of this range. After performing normalization, the range of the rest of 2 features are now in same range as other 2. How is this beneficial during training of weights and bias?
Suppose, i have a input with 2 features, now if both have [-1,1] and if we go in depth then one has actual range of [-0.9, 0.9] and other has actual range [-0.09, 0.09], then since the range of 2nd feature is 10 times smaller than that of first, will during training of weights and bias, the feature with more range will be given priority and will have larger weight/ bias than 2nd feature and will this make 1st feature more important than 2nd during training and afterwards during testing?


First of all, let me clarify how the training works. Let’s take a case of linear regression.

The system tries to figure out the weights w0, w1, w2… such that it can predict y on the basis of features f1, f2, f3 etc.:

y = w0 + f1w1 + f2w2 + f3 * w3 + …

And the gradient descent basically tweaks these weight until the predict y is close to the actual values by using loss functions as MSE for regression and cross entropy function for classificcation.

Now, back to your question. It usually does not what is the range of f1 versus whats the range of f2.

If a feature value, say f1, is very tiny, gradient descent would figure out that the weights w1 need to be large. Similarly, if a feature value are really big, it would figure out that the weights need to be small.

Now, the question is why do we really need to normalize? This is because it helps gradient descent to converge faster. So, if you do not normalize, the result might take long time to converge and therefore you might need more iterations.

I hope I was able to answer your question.