Abstract:Image classification from independent and identically distributed random variables is considered. Image classifiers are defined which are based on a linear combination of deep convolutional networks with max-pooling layer. Here all the weights are learned by stochastic gradient descent. A general result is presented which shows that the image classifiers are able to approximate the best possible deep convolutional network. In case that the a posteriori probability satisfies a suitable hierarchical composition model it is shown that the corresponding deep convolutional neural network image classifier achieves a rate of convergence which is independent of the dimension of the images.

What problem does this paper attempt to address?

This paper discusses the method of learning deep convolutional network image classifiers using stochastic gradient descent and over-parameterization. In the task of image classification, the goal is to learn the function relationship between the input and output from independently and identically distributed random variables, where the input is an image and the output is the label of the image category. The paper focuses on deep convolutional networks based on linear combinations and learns the weights through stochastic gradient descent. The author first points out that although deep convolutional neural networks (CNNs) perform well in image classification, it is usually impractical to directly compute the empirical risk minimizer in practice. Instead, over-parameterized networks with more trainable parameters are used and optimized through gradient descent methods. The paper analyzes the over-parameterized network using stochastic gradient descent, logarithmic loss function, and maximum pooling, and demonstrates that dimension reduction can be achieved under this setup, and the convergence rate is independent of the image dimension. The main results of the paper show that if the posterior probability satisfies an appropriate layered maximum pooling model, the proposed deep convolutional neural network image classifier can achieve a convergence rate close to optimal. In addition, if the posterior probability is close to 1 or 0 on the images, the classifier will achieve a faster convergence rate. The paper also discusses related work, including optimization theory for stochastic gradient descent, properties of deep neural networks, generalization ability, and their applications in large-scale machine learning. Finally, the paper provides estimators for asymptotic and finite sample properties, and outlines the steps to prove these results. In conclusion, this paper aims to address how to effectively train deep convolutional networks for efficient classification of large-scale image datasets through stochastic gradient descent and over-parameterization, while overcoming the curse of dimensionality problem in the process.

Learning of deep convolutional network image classifiers via stochastic gradient descent and over-parametrization

Analysis of the rate of convergence of an over-parametrized convolutional neural network image classifier learned by gradient descent

Statistical theory for image classification using deep convolutional neural networks with cross-entropy loss under the hierarchical max-pooling model

Stochastic Learning of Non-Conjugate Variational Posterior for Image Classification

Convergence of Stochastic Gradient Descent in Deep Neural Network

Novel Convergence Results of Adaptive Stochastic Gradient Descents

Multi-class Image Classification Based on Fast Stochastic Gradient Boosting.

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

A novel statistical approach to analyze image classification

Classification of Stochastic Processes Based on Deep Learning

Image classification and retrieval with random depthwise signed convolutional neural networks

Calibrated Stochastic Gradient Descent for Convolutional Neural Networks

Uniform Learning in a Deep Neural Network via "Oddball" Stochastic Gradient Descent

On the universal consistency of an over-parametrized deep neural network estimate learned by gradient descent

Stochastic Kernel Regularisation Improves Generalisation in Deep Kernel Machines

Wasserstein Pooling for Image Classification

A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization

A Stochastic Parallel Gradient Descent Algorithm for Person Re-identification

Enhancing Deep Stochastic Configuration Networks: Efficient Training via Low-Rank Matrix Approximation

An Analysis on Ensemble Learning optimized Medical Image Classification with Deep Convolutional Neural Networks

A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimiax Optimization