A systematic review on overfitting control in shallow and deep neural networks

Mohammad Mahdi Bejani,Mehdi Ghatee
DOI: https://doi.org/10.1007/s10462-021-09975-1
IF: 9.588
2021-03-03
Artificial Intelligence Review
Abstract:Shallow neural networks process the features directly, while deep networks extract features automatically along with the training. Both models suffer from overfitting or poor generalization in many cases. Deep networks include more hyper-parameters than shallow ones that increase the overfitting probability. This paper states a systematic review of the overfit controlling methods and categorizes them into passive, active, and semi-active subsets. A passive method designs a neural network before training, while an active method adapts a neural network along with the training process. A semi-active method redesigns a neural network when the training performance is poor. This review includes the theoretical and experimental backgrounds of these methods, their strengths and weaknesses, and the emerging techniques for overfitting detection. The adaptation of model complexity to the data complexity is another point in this review. The relation between overfitting control, regularization, network compression, and network simplification is also stated. The paper ends with some concluding lessons from the literature.
computer science, artificial intelligence
What problem does this paper attempt to address?