Shrinkage for Extreme Partial Least-Squares

Julyan Arbel,Stéphane Girard,Hadrien Lorenzo
2024-05-24
Abstract:This work focuses on dimension-reduction techniques for modelling conditional extreme values. Specifically, we investigate the idea that extreme values of a response variable can be explained by nonlinear functions derived from linear projections of an input random vector. In this context, the estimation of projection directions is examined, as approached by the Extreme Partial Least Squares (EPLS) method--an adaptation of the original Partial Least Squares (PLS) method tailored to the extreme-value framework. Further, a novel interpretation of EPLS directions as maximum likelihood estimators is introduced, utilizing the von Mises-Fisher distribution applied to hyperballs. The dimension reduction process is enhanced through the Bayesian paradigm, enabling the incorporation of prior information into the projection direction estimation. The maximum a posteriori estimator is derived in two specific cases, elucidating it as a regularization or shrinkage of the EPLS estimator. We also establish its asymptotic behavior as the sample size approaches infinity. A simulation data study is conducted in order to assess the practical utility of our proposed method. This clearly demonstrates its effectiveness even in moderate data problems within high-dimensional settings. Furthermore, we provide an illustrative example of the method's applicability using French farm income data, highlighting its efficacy in real-world scenarios.
Methodology,Statistics Theory,Computation
What problem does this paper attempt to address?
This paper aims to solve the problem of modeling conditional extreme values in high - dimensional data. Specifically, the paper focuses on how to explain the extreme values of response variables from the linear projection of input random vectors through nonlinear functions. To achieve this goal, the paper proposes the Extreme Partial Least Squares (EPLS) method, which is an adaptive improvement of the traditional Partial Least Squares (PLS) method and is specifically designed to handle problems in the extreme value framework. ### Main research questions: 1. **Dimensionality reduction in high - dimensional data**: When the dimension \( p \) of the data set is large and the sample size \( n \) is relatively small, traditional regression techniques may lead to over - fitting and unstable estimates. Therefore, it is necessary to find a low - dimensional subspace so that there is a strong correlation between the projected covariates and the response variables. 2. **Modeling conditional extreme values**: When modeling conditional extreme values, since tail events are relatively rare in themselves, traditional non - parametric estimators will be affected by the scarcity of extreme values and high - dimensional settings. Therefore, it is necessary to develop dimensionality reduction tools specifically for conditional extreme values. 3. **Prior information fusion in the Bayesian paradigm**: Through the Bayesian paradigm, prior information is incorporated into the estimation process of the projection direction to improve the stability and accuracy of the estimation. ### Solutions: 1. **EPLS method**: The paper proposes an EPLS method based on the PLS principle to estimate the linear combination of covariates that can best explain the extreme values of the response variables. 2. **Maximum likelihood estimation**: Re - interpret the EPLS direction as a maximum likelihood estimator and apply the von Mises - Fisher distribution on the hypersphere. 3. **Bayesian shrinkage**: Introduce two prior distributions (conjugate prior and sparse prior), and achieve the shrinkage of the EPLS estimator through the Maximum A Posteriori (MAP) estimator. The conjugate prior is based on the von Mises - Fisher distribution, and the sparse prior is based on the Laplace distribution to enforce sparsity. 4. **Asymptotic properties**: Establish the asymptotic behavior of the MAP estimator as the sample size tends to infinity. ### Experimental verification: - **Simulation data experiment**: Evaluate the effectiveness of the proposed method through simulation data, especially its performance in medium - scale data problems. - **Practical application**: Use French farm income data for practical applications to demonstrate the effectiveness of this method in the real world. ### Key contributions: - **Theoretical innovation**: Re - interpret the EPLS estimator as a maximum likelihood estimator and introduce prior information through the Bayesian paradigm. - **Method improvement**: Propose two shrinkage versions of the EPLS method (SEPaLS), based on the conjugate prior and the sparse prior respectively. - **Practical application**: Verify the effectiveness of the method through actual data, especially its performance in high - dimensional data. In conclusion, by proposing and improving the EPLS method, this paper solves the challenges of modeling conditional extreme values in high - dimensional data, and proves its effectiveness and practicality through theoretical analysis and experiments.