Abstract:We propose novel randomized optimization methods for high-dimensional convex problems based on restrictions of variables to random subspaces. We consider oblivious and data-adaptive subspaces and study their approximation properties via convex duality and Fenchel conjugates. A suitable adaptive subspace can be generated by sampling a correlated random matrix whose second order statistics mirror the input data. We illustrate that the adaptive strategy can significantly outperform the standard oblivious sampling method, which is widely used in the recent literature. We show that the relative error of the randomized approximations can be tightly characterized in terms of the spectrum of the data matrix and Gaussian width of the dual tangent cone at optimum. We develop lower bounds for both optimization and statistical error measures based on concentration of measure and Fano's inequality. We then present the consequences of our theory with data matrices of varying spectral decay profiles. Experimental results show that the proposed approach enables significant speed ups in a wide variety of machine learning and optimization problems including logistic regression, kernel classification with random convolution layers and shallow neural networks with rectified linear units.
What problem does this paper attempt to address?
This paper aims to solve the problems of computational efficiency and accuracy in high - dimensional optimization problems. Specifically, the author proposes a new stochastic optimization method to deal with high - dimensional convex optimization problems by restricting variables in random subspaces. This method considers two types of subspaces: oblivious subspaces and data - adaptive subspaces, and studies their approximation properties.
### Main Problems
1. **Computational Efficiency of High - Dimensional Optimization Problems**:
- High - dimensional optimization problems are becoming more and more common in fields such as computer vision, natural language processing, robotics, medicine, genomics, seismology or weather forecasting. With the rapid increase in the amount of data, how to efficiently solve these problems has become a challenge.
- Traditional optimization methods are computationally expensive when dealing with large - scale data, especially when forming gradients and operating on data matrices.
2. **Control of Approximation Error**:
- Dimensionality reduction through random projection is an effective method, but how to select an appropriate random projection matrix to minimize the approximation error is a key issue.
- The author pays special attention to whether the adaptive random projection matrix can significantly improve the approximation quality and conducts a theoretical analysis of the recovery error of the optimal solution.
### Solutions
1. **Adaptive Random Subspace**:
- A method for generating an adaptive random subspace is proposed, that is, by sampling a correlation random matrix that matches the second - order statistical characteristics of the input data.
- It is theoretically proven that the adaptive strategy can be significantly superior to the standard oblivious sampling method.
2. **Error Analysis**:
- Through convex duality and Fenchel conjugate techniques, the relative error of the random approximation is strictly characterized. The error is related to the spectrum of the data matrix and the Gaussian width of the dual tangent cone.
- Lower bounds of optimization and statistical error metrics are established, based on measure concentration inequalities and Fano inequalities.
3. **Application Verification**:
- The experimental results show that the proposed method can significantly accelerate the computation in various machine learning and optimization problems (such as logistic regression, kernel classification with random convolutional layers and shallow neural networks with ReLU activation functions).
### Theoretical Contributions
- **Strong Fenchel Duality**:
- Through strong Fenchel duality, the relationship between the primal problem and its dual problem is established, which is crucial for understanding the impact of right random projection on high - dimensional problems.
- **Upper Bound of Recovery Error**:
- A deterministic upper bound of the relative recovery error of the first - order estimator \( \hat{x}^{(1)} \) is given, showing that its performance is better than that of the zero - order estimator \( \hat{x}^{(0)} \).
- **High - Probability Upper Bound**:
- For adaptive Gaussian embedding and SRHT embedding, high - probability upper bounds of the recovery error are given respectively, and these upper bounds are related to the spectral residuals of the data matrix.
### Practical Significance
- **Computational Efficiency**:
- The proposed method can significantly reduce the computation time and resource consumption while maintaining high precision.
- **Scope of Application**:
- It is applicable to a variety of high - dimensional optimization problems, including but not limited to fields such as machine learning, image processing and signal processing.
In conclusion, this paper solves the problems of computational efficiency and approximation error control in high - dimensional optimization problems by introducing the adaptive random subspace method, providing theoretical support and technical means for practical applications.