Abstract:Deep neural networks (DNNs) trained with the logistic loss (also known as the cross entropy loss) have made impressive advancements in various binary classification tasks. Despite the considerable success in practice, generalization analysis for binary classification with deep neural networks and the logistic loss remains scarce. The unboundedness of the target function for the logistic loss in binary classification is the main obstacle to deriving satisfactory generalization bounds. In this paper, we aim to fill this gap by developing a novel theoretical analysis and using it to establish tight generalization bounds for training fully connected ReLU DNNs with logistic loss in binary classification. Our generalization analysis is based on an elegant oracle -type inequality which enables us to deal with the boundedness restriction of the target function. Using this oracle -type inequality, we establish generalization bounds for fully connected ReLU DNN classifiers f FNN n trained by empirical logistic risk minimization with respect to i.i.d. samples of size n, which lead to sharp rates of convergence as n -> infinity. In particular, we obtain optimal convergence rates f FNN n (up to some logarithmic factor) only requiring the Holder smoothness of the confor ditional class probability eta of data. Moreover, we consider a compositional assumption that requires eta to be the composition of several vector -valued multivariate functions of which each component function is either a maximum value function or a Holder smooth function only depending on a small number of its input variables. Under this assumption, we can f FNN n (up to some logarithmic factor) which are even derive optimal convergence rates for independent of the input dimension of data. This result explains why in practice DNN classifiers can overcome the curse of dimensionality and perform well in high -dimensional classification problems. Furthermore, we establish dimension -free rates of convergence under other circumstances such as when the decision boundary is piecewise smooth and the input data are bounded away from it. Besides the novel oracle -type inequality, the sharp convergence rates presented in our paper also owe to a tight error bound for approximating the natural logarithm function near zero (where it is unbounded) by ReLU DNNs. In addition, we justify our claims for the optimality of rates by proving corresponding minimax lower bounds. All these results are new in the literature and will deepen our theoretical understanding of classification with deep neural networks.

Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes

Nonparametric regression using over-parameterized shallow ReLU neural networks

Classification with Deep Neural Networks and Logistic Loss

Optimal rates of approximation by shallow ReLU$^k$ neural networks and applications to nonparametric regression

On Excess Risk Convergence Rates of Neural Network Classifiers

Optimal Rates of Approximation by Shallow ReLU Neural Networks and Applications to Nonparametric Regression

Generalization Ability of Wide Neural Networks on $\mathbb{R}$

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks

Generalization analysis of deep CNNs under maximum correntropy criterion

Non-convergence to global minimizers in data driven supervised deep learning: Adam and stochastic gradient descent optimization provably fail to converge to global minimizers in the training of deep neural networks with ReLU activation

A global convergence theory for deep ReLU implicit networks via over-parameterization

Convergence of Deep Neural Networks with General Activation Functions and Pooling

A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization

Generalization Ability of Wide Residual Networks

Nonparametric logistic regression with deep learning

Universal Consistency of Deep ReLU Neural Networks

A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimiax Optimization

On the Rates of Convergence from Surrogate Risk Minimizers to the Bayes Optimal Classifier.

A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks

Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods