A Geometric Theory for Binary Classification of Finite Datasets by DNNs with Relu Activations

DOI: https://doi.org/10.1007/s11063-024-11612-1
IF: 2.565
2024-04-27
Neural Processing Letters
Abstract:In this paper we investigate deep neural networks for binary classification of datasets from geometric perspective in order to understand the working mechanism of deep neural networks. First, we establish a geometrical result on injectivity of finite set under a projection from Euclidean space to the real line. Then by introducing notions of alternative points and alternative number, we propose an approach to design DNNs for binary classification of finite labeled points on the real line, thus proving existence of binary classification neural net with its hidden layers of width two and the number of hidden layers not larger than the cardinality of the finite labelled set. We also demonstrate geometrically how the dataset is transformed across every hidden layers in a narrow DNN setting for binary classification task.
computer science, artificial intelligence
What problem does this paper attempt to address?
This paper discusses the working principle of deep neural networks (DNNs) in binary classification finite datasets from a geometric perspective. The author first establishes a one-dimensional injective geometric result of finite sets under the projection from Euclidean space to the real number line. Then, by introducing the concepts of "alternate points" and "alternate numbers", the paper presents a method of designing DNNs with a width of 2 and a hidden layer number not exceeding the cardinality of the finite label set for binary classification, and proves the existence of such binary classification neural networks. In the study, the author demonstrates how the dataset undergoes geometric transformations in each layer of the DNN, especially in the narrow DNN setting for binary classification tasks. The main question of the paper is: how to design a DNN with the smallest possible scale to classify a given dataset. The author uses geometric methods to illustrate the transformation process of the dataset in the hidden layers and provides an example of how a hidden layer with a width of 2 performs binary classification on a finite dataset. The paper also discusses the relationship between the size (width and depth) of DNNs and the classification task, as well as how the data changes layer by layer during the learning process. The results show that for binary classification problems, a DNN can be constructed with a depth not exceeding one less than the number of labels in the dataset and a width of 2 for the hidden layers. Furthermore, the paper provides theorems on the size and capacity of DNNs, indicating the existence of a class of DNN classifiers that satisfy a size not exceeding 2(q-1) and a parameter capacity not exceeding n+q-1, where q is the number of points in the dataset and n is the dimension of the input space. In conclusion, the paper aims to enhance understanding of the working principles of DNNs, provide a theoretical basis for the interpretability of DNNs and the design of new architectures, although these results may not directly apply to the processing of real datasets.