Abstract:We explicitly construct zero loss neural network classifiers. We write the weight matrices and bias vectors in terms of cumulative parameters, which determine truncation maps acting recursively on input space. The configurations for the training data considered are (i) sufficiently small, well separated clusters corresponding to each class, and (ii) equivalence classes which are sequentially linearly separable. In the best case, for $Q$ classes of data in $\mathbb{R}^M$, global minimizers can be described with $Q(M+2)$ parameters.
Machine Learning,Artificial Intelligence,Mathematical Physics,Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to construct explicit global minimum solutions for a specific type of deep ReLU neural network, especially for "sequentially separable" data sets. Specifically, the author hopes to understand and explain the role of each layer of the neural network, and by constructing a truncation map, map each type of data to a point, thereby achieving zero - loss classification.
### Specific description of the problem
1. **Data types**:
- The paper considers two types of data configurations:
- **Sufficiently small and well - separated clusters**: Data points of each category form a small, well - separated cluster.
- **Sequentially linearly separable data**: Data can separate the data of one category from other categories by a hyperplane at each step.
2. **Objectives**:
- For the given training data \( X_0=\bigcup_{j = 1}^Q X_{0,j}\subset\mathbb{R}^M\), where \( Q\) is the number of categories and \( M\) is the input dimension, the objective is to find a ReLU neural network that can classify these data with zero loss.
- Specifically, the author hopes to find a ReLU neural network with \( Q + 1\) layers, whose weights and biases can be explicitly represented by cumulative parameters, so that the network reaches the global minimum on the training data.
### Solutions
1. **Truncation Map**:
- The author introduces the truncation map \(\tau_{W,b}(x)=(W)^+(\sigma(Wx + b)-b)\), where \( W\) is the weight matrix, \( b\) is the bias vector, and \(\sigma\) is the ReLU activation function.
- The role of the truncation map is to project certain regions in the input space (such as the backward cone) to a point while keeping other regions (such as the forward cone) unchanged.
2. **Recursively construct global minimum solutions**:
- For each layer, by choosing appropriate weight matrices \( W\) and bias vectors \( b\), it can be ensured that each type of data is mapped to a point while keeping the data of other categories unchanged.
- The last layer is an affine transformation used to match the class averages with the reference output.
3. **Theoretical results**:
- For sufficiently small and well - separated cluster data, the author proves that a ReLU neural network can achieve zero - loss classification with \( Q(M+Q/2)\) parameters.
- For sequentially linearly separable data, the author proves that a ReLU neural network can achieve zero - loss classification with \( Q + 1\) layers, each layer having a width of \( d_0=d_1=\cdots=d_Q = M\geq Q\), and the last layer having a width of \( d_{Q + 1}=Q\).
### Conclusion
By introducing the truncation map and cumulative parameters, the author successfully constructs explicit global minimum solutions for specific types of data. This method not only provides an explanation for the role of each layer of the neural network but also provides a new perspective for understanding the optimization problems of deep ReLU neural networks.