Sparse Extended Redundancy Analysis: Variable Selection via the Exclusive LASSO

Bing Cai Kok,Ji Sok Choi,Hyelim Oh,Ji Yeh Choi
DOI: https://doi.org/10.1080/00273171.2019.1694477
2019-11-28
Multivariate Behavioral Research
Abstract:Extended Redundancy Analysis is a statistical tool for exploring the directional relationships of multiple sets of exogenous variables on a set of endogenous variables. This approach posits that the endogenous and exogenous variables are related via latent components, each of which is extracted from a set of exogenous variables, that account for the maximum variation of the endogenous variables. However, it is often difficult to distinguish between the true variables that form the latent components and the false variables that do not, especially when the association between the true variables and the exogenous set is weak. To overcome this limitation, we propose a Sparse Extended Redundancy Analysis via the Exclusive LASSO that performs variable selection while maintaining model specification. We validate the performance of the proposed approach in a simulation study. Finally, the empirical utility of this approach is demonstrated through two examples—one on a study of youth academic achievement and the other on a text analysis of newspaper data.
psychology, experimental,mathematics, interdisciplinary applications,statistics & probability,social sciences, mathematical methods
What problem does this paper attempt to address?