Project Name => Predict Bike Demand in Future
Tool => BootML
Model => supervised learning with regression algorithms to build the model. Performance measure is mean squared error.
DataSet => created Dataset bikeDataSet and upload day.csv data file, types is CSV. It is located at “/home/ranjitpandey834075/BootML/Datasets/bikeDataSet_1/”
Data Clean up- Column Discarded => instant, dteday, atemp, casual and registered
Label=> cnt field
The Dataset set is split into training set and test set in the ratio 80:20 and random seed= 42
Stratified sampling => weathersit
Categorical Field => mnth, yr
Used all three algorithms => Linear Regression, Random Forest and Decision Tree are chosen to calculate RMSE. Then by fine tuning hyperparameters and selecting Grid Search the Final Model has been build.
bikeDataSet_prepared.shape => (584, 22)
Train a Linear Regression model =>
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
Calculate the RMSE in Linear Regression Model => 768.5944724413282
K-fold Cross Validation for Linear Regression =>
Scores: [696.03321432 750.45535426 960.59584706 695.0034799 981.00959193
830.59905649 825.37748752 975.82091261 640.40318235 607.21225263]
Mean: 796.251037906218
Standard deviation: 133.3546238995723
K-fold Cross Validation for Decision Tree Model =>
Scores: [ 763.21421122 889.43738645 1014.79785914 901.05760835 1306.80820691
857.39525348 1036.17299015 1255.78573425 1175.76949858 848.94624337]
Mean: 1004.9384991896061
Standard deviation: 176.909767307545
Calculate RMSE in Random Forest model => 256.55570549443365
Cross Validation in Random Forest model =>
Scores: [ 524.46036498 663.81704947 848.49415914 571.27245884 1077.40721374
676.34221889 695.49285665 890.15660905 627.14292476 645.30993013]
Mean: 721.9895785645234
Standard deviation: 159.18044852283458
the score of each hyperparameter combination tested during the grid search =>
714.9833781202965 {‘max_features’: 6, ‘n_estimators’: 10}
687.7435814930792 {‘max_features’: 6, ‘n_estimators’: 30}
724.7979163572528 {‘max_features’: 8, ‘n_estimators’: 10}
687.4134063083247 {‘max_features’: 8, ‘n_estimators’: 30}
686.3799212049184 {‘bootstrap’: False, ‘max_features’: 7, ‘n_estimators’: 20}
677.6198719863118 {‘bootstrap’: False, ‘max_features’: 7, ‘n_estimators’: 35}
717.6562284609751 {‘bootstrap’: False, ‘max_features’: 9, ‘n_estimators’: 20}
707.9118750457284 {‘bootstrap’: False, ‘max_features’: 9, ‘n_estimators’: 35}
score of each attribute in GridSearchCV =>
[(0.2902867422499126, ‘temp’),
(0.19624915015754796, ‘yr’),
(0.17332577979020708, ‘season’),
(0.09894612274579834, ‘mnth’),
(0.05776152697143375, ‘hum’),
(0.037425915375431, ‘windspeed’),
(0.03600668156127903, ‘weathersit’),
(0.020480467635953403, ‘weekday’),
(0.006266844439114171, ‘workingday’),
(0.0033767127366916465, ‘holiday’)]
final_rmse => 674.170774876122