Variable screening based on Gaussian Centered L-moments
Hyowon An,Kai Zhang,Hannu Oja,J. S. Marron
DOI: https://doi.org/10.48550/arXiv.1908.11048
2019-08-29
Abstract:An important challenge in big data is identification of important variables. In this paper, we propose methods of discovering variables with non-standard univariate marginal distributions. The conventional moments-based summary statistics can be well-adopted for that purpose, but their sensitivity to outliers can lead to selection based on a few outliers rather than distributional shape such as bimodality. To address this type of non-robustness, we consider the L-moments. Using these in practice, however, has a limitation because they do not take zero values at the Gaussian distributions to which the shape of a marginal distribution is most naturally compared. As a remedy, we propose Gaussian Centered L-moments which share advantages of the L-moments but have zeros at the Gaussian distributions. The strength of Gaussian Centered L-moments over other conventional moments is shown in theoretical and practical aspects such as their performances in screening important genes in cancer genetics data.
Methodology,Statistics Theory,Computation
What problem does this paper attempt to address?