Normalize data and output labels

Annapurna · April 24, 2019, 10:23am

Hi

After normalization and training the model, the resultant data set is in the scaled parameters. How do I get the regular data?

For e.g., while predicting housing prices in California, I have used min-max scaling. The result Scored Dataset in AzureML is showing the output labels between 0 and 1. How do I get the labels in regular price terms?

Thanks

Deepak_Singh · April 25, 2019, 9:31am

Hi Annapurna,

While Normalizing/Scaling the training (or test) dataset, please note that the ‘label’ column (whose value needs to be predicted) should be dropped from the original training (or test) dataset. Hence, for ‘California housing prices’ dataset, please drop ‘median_house_value’ column from your training/test dataset, and then apply Normalization/Scaling on it. The predicted value that you get from ‘model.predict(scaled_training_dataset)’ will not be in normalized/scaled format, it will be in original format (that existed before Normalization/Scaling), and you can easily compare this predicted value with the ‘label’ value.
Hence, you will not have any need to convert the Normalized/Scaled data to original form. Anyways, the original dataset is also with you, because normalization/scaling will create a new ‘scaled’ dataset, original dataset will remain as it is.

Annapurna · April 25, 2019, 10:01am

Hi

I removed the label from feature scaling as told in the course. But that is my question. If all the feature values are between 0 and 1, the model output would also be in these terms right? How does the model back calculate and provide output in regular terms

Thanks
Purna

Deepak_Singh · April 25, 2019, 1:04pm

Hi Annapurna,

Even though input to predict() method of model is ‘scaled’/‘normalized’ dataset, the output (predicted values) of this predict() method is not a normalized/scaled value, it is a normal value. This the model takes care of internally.
Also, the actual label values (y) should not be scaled/normalized.
Thus, we are able to use this predicted value (y-cap) along with with the actual value (y) to calculate our performance measure scores (like accuracy, MSE, etc.). Both - y-cap and y - are un-scaled (not normalized) values.
Please try printing the predicted values (output of predict() method).

Deepak_Singh · April 25, 2019, 1:14pm

Also, different ML algorithms (models) have different algorithms (logic) to find the the prediction value of the label (y-cap). For example, for classification problem, SoftMax Regressor uses probability values of the classes to decide the output (predicted) class - class with highest probability value is the predicted class.
Request you to please go through future chapters for better understanding of different ML algorithms, their cost functions and workings.