Firstly would like to thank Sandeep Sir for taking time out for a wonderful discussion
Primary points:
-
The simple perceptron model could not really predict accurately, which is why the Multi-layer perceptron was introduced. With neural networks, neither of these models worked well- which is why we had to use the other activation functions
-
Softmax Regressor - Used for normalizing outputs. Otherwise the values in the last layer would have exaggerated values.