Eric Dolores-Cuenca,Aldo Guzman-Saenz,Sangil Kim,Susana Lopez-Moreno,Jose Mendoza-Cortes
Abstract:The paper ``Tropical Geometry of Deep Neural Networks'' by L. Zhang et al. introduces an equivalence between integer-valued neural networks (IVNN) with activation $\text{ReLU}_{t}$ and tropical rational functions, which come with a map to polytopes. Here, IVNN refers to a network with integer weights but real biases, and $\text{ReLU}_{t}$ is defined as $\text{ReLU}_{t}(x)=\max(x,t)$ for $t\in\mathbb{R}\cup\{-\infty\}$.
For every poset with $n$ points, there exists a corresponding order polytope, i.e., a convex polytope in the unit cube $[0,1]^n$ whose coordinates obey the inequalities of the poset. We study neural networks whose associated polytope is an order polytope. We then explain how posets with four points induce neural networks that can be interpreted as $2\times 2$ convolutional filters. These poset filters can be added to any neural network, not only IVNN.
Similarly to maxout, poset convolutional filters update the weights of the neural network during backpropagation with more precision than average pooling, max pooling, or mixed pooling, without the need to train extra parameters. We report experiments that support our statements.
We also prove that the assignment from a poset to an order polytope (and to certain tropical polynomials) is one to one, and we define the structure of algebra over the operad of posets on tropical polynomials.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to combine order theory with convolutional neural networks (CNN) in machine learning and propose a new type of convolutional filter to improve the performance of convolutional neural networks when processing images and other tasks. Specifically, the paper mainly focuses on the following aspects:
1. **Introducing convolutional filters based on order theory**:
- By studying partially ordered sets (poset) and their corresponding order polytopes, the paper proposes new convolutional filters based on these mathematical structures.
- These filters can update the weights of neural networks more accurately and avoid the problem of information loss in traditional pooling methods (such as max - pooling, average - pooling, and mixed - pooling).
2. **Improving the accuracy of back - propagation**:
- Traditional pooling methods may ignore some important gradient information during the back - propagation process. For example, max - pooling only transmits the gradient of one maximum value, and average - pooling distributes the gradient equally, which may lead to inaccurate weight updates.
- The newly proposed poset filters can transmit gradients more accurately during the back - propagation process, thereby improving the learning effect of the model.
3. **Reducing computational complexity**:
- The paper also explores how to reduce the amount of computation by selecting specific poset structures (such as four - point chains) while maintaining performance similar to more complex filters.
4. **Verifying the effectiveness of the new method**:
- Through a series of experiments, the paper verifies the effectiveness of the proposed poset filters on different neural network architectures (such as Quaternion Neural Network, DenseNet, CNN) and shows their performance superiority over traditional pooling methods.
### Formula summary
- **ReLU activation function**:
\[
\text{ReLU}_t(x)=\max(x, t)\quad\text{for }t\in\mathbb{R}\cup\{-\infty\}
\]
- **Poset filter example**:
- For the input matrix \(\begin{bmatrix}a_{0,0}&a_{0,1}\\a_{1,0}&a_{1,1}\end{bmatrix}\), the first filter is calculated as follows:
\[
\max\left\{0, a_{0,0}, a_{0,0}+a_{0,1}, a_{0,0}+a_{0,1}+a_{1,0}, a_{0,0}+a_{0,1}+a_{1,0}+a_{1,1}\right\}
\]
- The second filter is calculated as follows:
\[
\max\left\{0, \max_{i,j}\{a_{i,j}\}, \max_{i,j,k,l,(i,j)\neq(k,l)}\{a_{i,j}+a_{k,l}\}, \max_{i,j,k,l,m,n,(i,j)\neq(k,l),(i,j)\neq(m,n),(k,l)\neq(m,n)}\{a_{i,j}+a_{k,l}+a_{m,n}\}, a_{0,0}+a_{1,0}+a_{0,1}+a_{1,1}\right\}
\]
Through these methods, the paper aims to provide a new idea of using mathematical tools in order theory to improve the design and performance of convolutional neural networks.