ESS-ReduNet: Enhancing Subspace Separability of ReduNet via Dynamic Expansion with Bayesian Inference

Xiaojie Yu,Haibo Zhang,Lizhi Peng,Fengyang Sun,Jeremiah Deng
2024-11-27
Abstract:ReduNet is a deep neural network model that leverages the principle of maximal coding rate \textbf{redu}ction to transform original data samples into a low-dimensional, linear discriminative feature representation. Unlike traditional deep learning frameworks, ReduNet constructs its parameters explicitly layer by layer, with each layer's parameters derived based on the features transformed from the preceding layer. Rather than directly using labels, ReduNet uses the similarity between each category's spanned subspace and the data samples for feature updates at each layer. This may lead to features being updated in the wrong direction, impairing the correct construction of network parameters and reducing the network's convergence speed. To address this issue, based on the geometric interpretation of the network parameters, this paper presents ESS-ReduNet to enhance the separability of each category's subspace by dynamically controlling the expansion of the overall spanned space of the samples. Meanwhile, label knowledge is incorporated with Bayesian inference to encourage the decoupling of subspaces. Finally, stability, as assessed by the condition number, serves as an auxiliary criterion for halting training. Experiments on the ESR, HAR, Covertype, and Gas datasets demonstrate that ESS-ReduNet achieves more than 10x improvement in convergence compared to ReduNet. Notably, on the ESR dataset, the features transformed by ESS-ReduNet achieve a 47\% improvement in SVM classification accuracy.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve two main problems encountered by ReduNet in the feature update and network training process: 1. **Wrong feature update direction**: ReduNet updates features by estimating the similarity between the sub - space of each category and samples, rather than using labels directly. However, in the early stage of network training, this estimation may be very inaccurate, leading to the wrong feature update direction, which in turn affects the correct construction of network parameters and the convergence speed. 2. **Slow network convergence speed and feature quality degradation**: Due to the restricted overall spanned space, the sub - spaces of different categories may be entangled together, unable to effectively estimate the category membership of samples, resulting in incorrect feature updates and forming a vicious cycle. In addition, as the number of layers increases, although the objective function tends to be stable, the feature quality will gradually decline due to the poor network structure. To solve the above problems, the paper proposes the ESS - ReduNet framework, and the specific improvement measures include: - **Dynamically control the expansion process**: Introduce a weight function to dynamically adjust the intensity of the expansion operator to enhance the separation of sub - spaces of different categories and ensure that samples are updated towards the correct sub - space. \[ w(\tau_{\ell})=\min(\exp(\tau_{\ell}), u)\in[1, u],\quad\tau_{\ell}\in[0,\infty) \] - **Introduce label knowledge by combining Bayesian inference**: By comparing the estimation results with the real labels, use Bayesian inference to correct the estimation error of sample category membership, avoiding the inconsistency problem caused by using labels directly. \[ p_{ij} = P(z_{\ell}\in C_{i}|z_{\ell}\to C_{j})=\frac{P(z_{\ell}\in C_{i})P(z_{\ell}\to C_{j}|z_{\ell}\in C_{i})}{\sum_{i}P(z_{\ell}\to C_{j}|z_{\ell}\in C_{i})P(z_{\ell}\in C_{i})} \] - **Condition number as an auxiliary stop criterion**: Use the condition number to evaluate the stability of the linear system as an auxiliary standard to decide when to stop training, thereby saving computing resources and maintaining feature quality. The experimental results show that ESS - ReduNet significantly accelerates network convergence and improves classification accuracy on multiple datasets. For example, on the ESR dataset, the SVM classification accuracy after feature transformation is increased by 47%, and the network convergence speed is increased by more than 10 times.