A method for variable selection in a multivariate functional linear regression model

Alban Mina Mbina,Guy Martial Nkiet
2023-11-02
Abstract:We propose a new variable selection procedure for a functional linear model with multiple scalar responses and multiple functional predictors. This method is based on basis expansions of the involved functional predictors and coefficients that lead to a multivariate linear regression model. Then a criterion by means of which the variable selection problem reduces to that of estimating a suitable set is introduced. Estimation of this set is achieved by using appropriate penalizations of estimates of this criterion, so leading to our proposal. A simulation study that permits to investigate the effectiveness of the proposed approach and to compare it with existing methods is given.
Statistics Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is variable selection in the multivariate function linear regression model. Specifically, the author proposes a new method to select truly relevant predictor variables in the multivariate function linear regression model. This problem is very important in statistical modeling because when faced with a large number of potential predictor variables, it is crucial to determine which variables are truly important for explaining the response variable. ### Background and Motivation - **Multivariate Linear Regression Model**: The multivariate linear regression model has been extensively studied from multiple perspectives. One important issue is variable selection, that is, determining which variables are truly important for explaining the response variable from a large number of predictor variables. - **Functional Data Analysis (FDA)**: In recent years, statistical methods for dealing with data in the form of curves have developed significantly, forming a very active field of statistics - Functional Data Analysis (FDA). In the context of FDA, the linear regression model is introduced to describe the relationship between multiple function variables and one or more response variables. - **Existing Methods**: Although there are already some methods that can handle the variable selection problem in regression analysis with multiple function predictor variables, most of these methods are extensions of traditional multivariate linear regression methods. ### Main Contributions of the Paper - **New Method**: The author proposes a new method based on basis expansion, which transforms the initial multivariate function linear regression model into a multivariate linear regression model, thereby simplifying the variable selection problem. - **Selection Criterion**: A selection criterion is introduced to achieve variable selection by estimating an appropriate set. This criterion is achieved through appropriate penalized estimation. - **Performance Evaluation**: The effectiveness of the proposed variable selection strategy is studied through Monte Carlo simulation, and it is compared with the random subspace method and the group SCAD method. ### Method Overview 1. **Model Definition**: - Consider the multivariate function linear regression model: \[ Y_j = \sum_{\ell = 1}^p \int_{I_\ell} B_{j\ell}(t)X_\ell(t)\,dt + \epsilon_j \] where \(Y_j\) is the scalar response variable, \(X_\ell(t)\) is the function predictor variable, \(B_{j\ell}(t)\) is the function coefficient, and \(\epsilon_j\) is the error term. 2. **Basis Representation**: - Use basis expansion to represent the function predictor variables and coefficients as: \[ B_{j\ell}(t)\approx\sum_{k = 1}^{d_\ell}b_{jk\ell}\phi_{k\ell}(t) \] \[ X_\ell(t)\approx\sum_{k = 1}^{d_\ell}X_{k\ell}\phi_{k\ell}(t) \] where \(\phi_{k\ell}(t)\) is the basis function of \(L^2(I_\ell)\), and \(d_\ell\) is the dimension parameter. 3. **Selection Criterion**: - Introduce a selection criterion \(\xi_K\), which achieves variable selection by estimating an appropriate set: \[ \xi_K=\|C_{12}-C_1\Pi_KC_{12}\| \] where \(\Pi_K = A_K(A_KC_1A_K^T)^{-1}A_K\). 4. **Variable Selection**: - Select the optimal tuning parameters \(\alpha\) and \(\beta\) by minimizing the cross - validation index \(CV(\alpha,\beta)\). ### Conclusion The method proposed in the paper performs well in variable selection in the multivariate function linear regression model, especially in dealing with multiple function predictor variables.