If the model is trained on scaled X_train values, it would do the good prediction on scaled X_train or X_test provided the scaling parameters are same.

A model learns to predict y from X - which can be scaled or not. If you train a model on scaled X value and you are testing on non-scaled, the result will be bad.

If you say scale the value of y, the chances are that the result will be erroneous specifically if the relationship between X and y is non-linear.

Imagine the relationship between X and y is this:

## if X < 20: y = X*5 - 10*

else:

X > 20: y = X3 - 5

X, y

15, 65

21, 58

Say, we scaled the value, y_scaled = (y - 65)/(65-58)

X, y_scaled

15, 1

21, 0

If you think about, the translated of y_scaled to y would involve multiplying by (65-58) and additing with 65. This would not give great results because y is not linearly dependent on X.

Try doing an experiement with the above example using neural network model.