Hi Nishant,
To start with, I had a similar question posted in the forum and I did not get any reply after 2 weeks. So I have resigned to the fact that this is how it is probably. I am yet to figure out the best way to get queries answered, except ofcourse in the class when these are answered But again, I find it not correct to take up a lot of time of others who are attending as well if it is not on the topic of the session that day.
Anyways coming to the problem you are trying to solve, as this is something I was also trying to solve and I was able to get some headway into this, I think it may possibly be of help. I was able to get 96% accuracy in the validation set and 98% for training (infact it finally became 100% and hence overfitted) and I am seeing 86% when I run test set.
The steps are pretty much the same, but few things on this dataset.
-
the number of images are too small for a proper learning for the model. And when we split into training and validation, it becomes even more smaller.
-
And as we split statically, meaning out of 209 images, we are like taking 29 fro Validation and 180 for Training, and in each epoch, the same dataset (train and validation) is being used for training, it is really not learning is what I feel. (I wanted to ask this to someone, but as I have not received answer to my initial question, I am not sure there is a point in asking anyways)
So the change I did are basically two things
-
Increased the dataset size for training by doing image augmentation. This is to take care of the first point on dataset size I mentioned above. You can do multiple ways (flip, brighten, … etc.), but what I did was I flipped the image and created another 209 images with the flipped set. So my overall size of the dataset for the model to learn became double (418)
Code below -
import random
train_set_x_orig_flip = train_set_x_orig.copy()
train_set_x_orig_flip = np.flip(train_set_x_orig_flip, axis=2)
train_set_y_orig_flip = train_set_y_orig.copy()
train_set_x_orig = np.concatenate((train_set_x_orig_flip, train_set_x_orig_flip), axis=0)
train_set_y_orig = np.concatenate((train_set_y_orig, train_set_y_orig))
-
While doing model training, to shuffle the train and validations rows a bit, I used kfold around model.fit (to take care of the second issue mentioned above)
code as below -
X = train_set_x_orig_normalized
Y = train_set_y_orig
from sklearn.model_selection import StratifiedKFold
kFold = StratifiedKFold(n_splits=10)
for train,valid in kFold.split(X,Y ):
history = model.fit(X[train], Y[train], epochs=5, validation_data=(X[valid],Y[valid]))
BTW, why are you having the last Dense layer with two nodes? The o/p is just one as it is a Binary classification problem, right? Your y dataset also is just one attribute having values 1 or 0. You may need to use binary_crossentropy as well if you make this change.
Hope this helps. I feel you should get a better result with these changes. The plot will still not show you the convergence you expect as the we are splitting into mini batches I believe.
But the training, validation accuracy and test data evaluation and finally predictions should be much better.