In deep learning, how to estimate the min number of training records?

In “traditional neural nets” such as back propagation or RBF with 1-3 hidden layers, in the past I have used a guideline for good generalization to have 10 times as many training records as free parameters, or neural net weights.

However, from what I have seen in deep learning papers, it seems like while DL can make good use of huge amounts of data, how well does it work with smaller amounts of data? What is a problem that is too small (in terms of the number of training records) for DL? Sure, convolutional NN have many duplicated weights for the shifting convolution that can be tied to one update. Yes, I am aware of drop out a % of a fully connected layer. Does Relu help with generalization more? My applications of interest are not CNN.

Any links to good related reading?

Imagenet competitions have 1000 image categories, but may have only a few hundred examples per target category. I assume there is a lot of generalization with lower level feature extraction.

One intuition I have (to be discussed) is that maybe I should look at the number of number of weights connecting two consecutive layers, especially if training autoencoder style, but I don’t know if that would generalize to non-encoder DL net configurations (for generalization and minimal training record estimations).


If by training records you mean training inputs, then with very less traning data , it will result in over fitting.

Hi Greg,

Could you rephrase your question? I could understand the subject but the remaining part of question helping in understanding the question.

There is no way to estimate the minimum number of training records.

It is the other way, we can prove that the bigger and diverse the data, the higher are the chances of better prediction. Say, if we are using, an ensemble of any algorithm with diverse dataset and weights, we would definitely get better results.

Also, if our algorithm is really good, we can improve our outcomes even for lesser amount of data but it is hard to estimate “how less”.