Predict the bike demand in future

Correlation:
season 0.180080
yr 0.249280
mnth 0.124786
hr 0.395162
holiday -0.031149
weekday 0.027396
workingday 0.031818
weathersit -0.140356
temp 0.400696
hum -0.320436
windspeed 0.087335
cnt 1.000000

Linear Regression:
RMSE: 140.69
CV RMSE: 140.92

Decision Tree:
RMSE: 0.598
CV RMSE: 60.30
Random Forest:
RMSE: 15.964
CV RMSE:  43.51

Final RMSE: 49.57

Features:
yr, mnth, hr, holiday, weekday, workingday, weathersit, temp, hum, windspeed

Label:
cnt

Used stratified sampling on season

Used Standardization for feature scaling

Selected Linear Regression, random forest, Decision Tree algorithms

RMSE:
Decision Tree - 0.5989453436724405
Random Forest Model - 16.136436478906223
Linear Regression Model - 140.42528931467888
Cross Validation Mean RMSE:
Decision Tree - 61.75452780128436
Random Forest Model - 43.85423021350333
Linear Regression Model - 140.66534762287745

Random forest is chosen as best model
Final RMSE : 52.01065305006697

Aim is to predict the bike demand in the future

BootML was used to build the model.

Observation:

Linear Regression:

RMSE: 749.263

Cross Validation Mean: 791.954

Cross Validation SD: 115.856

Decision Tree:

RMSE: 0.0

Cross Validation Mean: 967.165

Cross Validation SD: 195.063

Random Forest:

RMSE: 259.807

Cross Validation Mean: 724.681

Cross Validation SD: 150.572

Final RMSE : 719.986

Conclusion

    1. Decision tree is overfitting.
    1. Based on the RMSE value random forest is the best fit. After fine tuning the final RMSE is 719.986.

Objective :: Predict the bike demand in the future

I used the BootML to build, select and fine tune the model using the day.csv file for training. Highlighting the various details pertaining to the activity below :-

    1. Discarded the following columns instant, dteday, atemp, casual & registered
    1. Used cnt as label and used the rest as features
    1. The data was split into training and test in the ratio 80:20
    1. Kept the following fields as categorical fields – season, yr, mnth, holiday, weekday, workingday & weathersit
    1. Used feature scaling on the following features temp, hum & windspeed using standardization method
    1. The selected algorithms are Linear Regression, Random Forest & Decision Tree
    1. Hyper-paremeter tuning code using grid search was enabled

Observation:

Linear Regression:

RMSE: 749.688

Cross Validation Mean: 788.058

Cross Validation SD: 119.255

Decision Tree:

RMSE: 0.0

Cross Validation Mean: 999

Cross Validation SD: 193.095

Random Forest:

RMSE: 259.848

Cross Validation Mean: 730.428

Cross Validation SD: 161.052

Final RMSE : 687.968

Findings :

    1. Random Forest has the lowest RMSE
    1. Cross-validation mean for Random Forest is lower than Linear Regression suggesting better generalization
    1. Decision tree is overfitting
    1. Based on the RMSE Random forest model is a better model and chosen for fine tuning with a final RMSE of 687.968