Learning of deep convolutional network image classifiers via stochastic gradient descent and over-parametrization

Michael Kohler,Adam Krzyzak,Alisha Sänger
2024-04-11
Abstract:Image classification from independent and identically distributed random variables is considered. Image classifiers are defined which are based on a linear combination of deep convolutional networks with max-pooling layer. Here all the weights are learned by stochastic gradient descent. A general result is presented which shows that the image classifiers are able to approximate the best possible deep convolutional network. In case that the a posteriori probability satisfies a suitable hierarchical composition model it is shown that the corresponding deep convolutional neural network image classifier achieves a rate of convergence which is independent of the dimension of the images.
Statistics Theory
What problem does this paper attempt to address?
This paper discusses the method of learning deep convolutional network image classifiers using stochastic gradient descent and over-parameterization. In the task of image classification, the goal is to learn the function relationship between the input and output from independently and identically distributed random variables, where the input is an image and the output is the label of the image category. The paper focuses on deep convolutional networks based on linear combinations and learns the weights through stochastic gradient descent. The author first points out that although deep convolutional neural networks (CNNs) perform well in image classification, it is usually impractical to directly compute the empirical risk minimizer in practice. Instead, over-parameterized networks with more trainable parameters are used and optimized through gradient descent methods. The paper analyzes the over-parameterized network using stochastic gradient descent, logarithmic loss function, and maximum pooling, and demonstrates that dimension reduction can be achieved under this setup, and the convergence rate is independent of the image dimension. The main results of the paper show that if the posterior probability satisfies an appropriate layered maximum pooling model, the proposed deep convolutional neural network image classifier can achieve a convergence rate close to optimal. In addition, if the posterior probability is close to 1 or 0 on the images, the classifier will achieve a faster convergence rate. The paper also discusses related work, including optimization theory for stochastic gradient descent, properties of deep neural networks, generalization ability, and their applications in large-scale machine learning. Finally, the paper provides estimators for asymptotic and finite sample properties, and outlines the steps to prove these results. In conclusion, this paper aims to address how to effectively train deep convolutional networks for efficient classification of large-scale image datasets through stochastic gradient descent and over-parameterization, while overcoming the curse of dimensionality problem in the process.