Generalization Bounds for Sparse Random Feature Expansions

Abolfazl Hashemi,Hayden Schaeffer,Robert Shi,Ufuk Topcu,Giang Tran,Rachel Ward
DOI: https://doi.org/10.48550/arXiv.2103.03191
2021-08-21
Abstract:Random feature methods have been successful in various machine learning tasks, are easy to compute, and come with theoretical accuracy bounds. They serve as an alternative approach to standard neural networks since they can represent similar function spaces without a costly training phase. However, for accuracy, random feature methods require more measurements than trainable parameters, limiting their use for data-scarce applications or problems in scientific machine learning. This paper introduces the sparse random feature expansion to obtain parsimonious random feature models. Specifically, we leverage ideas from compressive sensing to generate random feature expansions with theoretical guarantees even in the data-scarce setting. In particular, we provide generalization bounds for functions in a certain class (that is dense in a reproducing kernel Hilbert space) depending on the number of samples and the distribution of features. The generalization bounds improve with additional structural conditions, such as coordinate sparsity, compact clusters of the spectrum, or rapid spectral decay. In particular, by introducing sparse features, i.e. features with random sparse weights, we provide improved bounds for low order functions. We show that the sparse random feature expansions outperforms shallow networks in several scientific machine learning tasks.
Machine Learning,Numerical Analysis,Optimization and Control,Probability
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the generalization ability of the random feature method in the case of scarce data, especially for the approximation of high - dimensional low - order functions. Specifically, the paper introduces a method called Sparse Random Feature Expansion (SRFE). By using the idea of compressed sensing, it generates a random feature expansion with theoretical guarantees and can provide good generalization performance even in the case of scarce data. The paper also provides generalization bounds for function classes (dense in the reproducing kernel Hilbert space), and these bounds vary with the number of samples and the feature distribution. In addition, by introducing sparse features (i.e., features with random sparse weights), the paper provides improved bounds for low - order functions and shows that the sparse random feature expansion outperforms the performance of shallow networks in multiple scientific machine - learning tasks. ### Background and Motivation of the Paper 1. **Advantages and Limitations of the Random Feature Method** - The random feature method has been successful in various machine - learning tasks. It is easy to calculate and has theoretically accurate boundaries. - Compared with standard neural networks, the random feature method can represent a similar function space without expensive training. - However, in order to achieve high accuracy, the random feature method usually requires more measurement data than trainable parameters, which limits its use in data - scarce applications or scientific machine - learning problems. 2. **Proposal of Sparse Random Feature Expansion (SRFE)** - To overcome the above limitations, the paper proposes Sparse Random Feature Expansion (SRFE) to obtain a more concise random feature model. - SRFE uses the idea of compressed sensing to generate a random feature expansion with theoretical guarantees and can work effectively even in the case of scarce data. ### Main Contributions 1. **Proposal of the Sparse Feature Model** - A new sparse feature model (SRFE) is proposed. This model improves the compressed sensing and polynomial chaos expansion (PCE) methods by using the Random Fourier Feature (RFF) method. - SRFE outperforms standard shallow neural networks in the case of limited data. 2. **Theoretical Analysis** - The boundaries of sample complexity and feature complexity are provided, and these boundaries control the error between SRFE and the target function. - It is proved that in the case of high - dimensional low - order functions, SRFE can achieve a generalization boundary of \(O(N^{-1/2})\), where the constant depends on the polynomial of the dimension rather than the exponent, thus overcoming the curse of dimensionality. 3. **Selection of Sparse Features** - By introducing sparse feature weights, SRFE performs well in approximating low - order functions and helps to alleviate the approximation problem of high - dimensional functions. ### Mathematical Formulas - **Generalization Boundary** \[ \sqrt{\int_{\mathbb{R}^d} |f(x) - f^\sharp(x)|^2 d\mu} \leq C' \left(1 + \frac{N^{1/2}}{s^{1/2}} m^{-1/4} \log^{1/4} \left(\frac{1}{\delta}\right)\right) \kappa_{s,1}(c^\star) + C \left(1 + \frac{N^{1/2}}{m^{1/4}} \log^{1/4} \left(\frac{1}{\delta}\right)\right) \sqrt{\epsilon^2 \|f\|^2_\rho + 4\nu^2} \] - **Definition of Sparse Feature Weights** \[ \tilde{c}^\star_j := \frac{1}{K} \sum_{\ell = 1}^K \tilde{c}^\star_{\ell,j}, \quad \text{where} \quad \tilde{c}^\star_{\ell,j} = \begin{cases} \frac{\alpha_\ell(\omega_j)}{n \rho(\omega_