Golden Ratio-Based Sufficient Dimension Reduction

Wenjing Yang,Yuhong Yang
2024-10-25
Abstract:Many machine learning applications deal with high dimensional data. To make computations feasible and learning more efficient, it is often desirable to reduce the dimensionality of the input variables by finding linear combinations of the predictors that can retain as much original information as possible in the relationship between the response and the original predictors. We propose a neural network based sufficient dimension reduction method that not only identifies the structural dimension effectively, but also estimates the central space well. It takes advantages of approximation capabilities of neural networks for functions in Barron classes and leads to reduced computation cost compared to other dimension reduction methods in the literature. Additionally, the framework can be extended to fit practical dimension reduction, making the methodology more applicable in practical settings.
Machine Learning,Methodology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively reduce the dimension of input variables while retaining as much original information as possible in high - dimensional data processing, especially in the relationship between the response variable and the original predictor variables. Specifically, the paper proposes a neural - network - based sufficient dimension reduction method (GRNN - SDR), which can not only effectively identify the structural dimension but also estimate the central subspace well. This method utilizes the approximation ability of neural networks for Barron - type functions. Compared with other dimension - reduction methods in the literature, it can significantly reduce the computational cost and is more applicable in practical applications. ### Main Contributions 1. **Dynamically Searching for Structural Dimension**: The method proposed in the paper dynamically searches for the structural dimension by introducing the golden ratio, which significantly reduces the computational time and complexity. 2. **Theoretical Basis**: Under appropriate conditions, the paper establishes theoretical results and proves that the method can find the true structural dimension with high probability. 3. **No Prior Knowledge Required**: Unlike most existing methods that need to assume the structural dimension, this method can effectively obtain the true or practical structural dimension without prior knowledge. 4. **High Precision and Stability**: Extensive experimental comparisons show that this method can estimate the central subspace more accurately in most cases and exhibits higher stability when the true dimension is not very small. 5. **Algorithm Complexity**: Under a fixed neural network structure, the algorithm complexity is \(O(N)\), where \(N\) is the sample size, providing a promising solution to the dimension - reduction challenge. ### Method Overview - **Preliminary Definition**: Define the goal of sufficient dimension reduction, that is, to find a set of linear combinations \(\beta^T X\) such that \(Y \perp \perp X | \beta^T X\), where \(Y\) is the response variable and \(X\) is the \(p\)-dimensional predictor variable. - **Approximation Bound**: Approximate the objective function \(g(z)\) by the neural network model \(g_m(z, \theta)\) and define the approximate \(L_2\)-norm error. - **Training Error Bound**: Analyze the error bound in the actual neural network training and propose a criterion for selecting the optimal structural dimension. - **Structural Dimension Analysis**: Select the optimal structural dimension by minimizing the mean - squared error (MSE) on the validation set plus a penalty term. - **Algorithm Implementation**: Describe in detail the neural network learning process and the specific steps of dynamically searching for the structural dimension. ### Experimental Results - **Noise Level**: Under different noise levels, GRNN - SDR shows the best estimation accuracy and the smallest mean - squared error. - **Sample Size and Computation Time**: As the sample size increases, the performance of GRNN - SDR is significantly better than that of other methods, and the computation time increases linearly with the sample size, demonstrating its high efficiency. - **Covariate Distribution**: Under different covariate distributions, GRNN - SDR still performs well. - **Feature Dimension Size**: In high - dimensional features, GRNN - SDR can still effectively reduce the dimension and maintain high estimation accuracy. In conclusion, the paper proposes a novel neural - network - based sufficient dimension - reduction method, solves the key problems in high - dimensional data processing, and demonstrates its superior performance in multiple aspects.