Comparison of Penalty-based Feature Selection Approach on High Throughput Biological Data.

Ningya Wang,Wenbin Zhou,Jiamin Wu,Shengjia Chen,Ziling Fan
DOI: https://doi.org/10.1145/3397391.3397404
2020-01-01
Abstract:Feature selection has become a critical process in training models with high throughput biological data. One of the most critical categories of feature selection techniques is penalty-based approaches because of the sparsity of selected features. Penalty-based methods automatically set small estimated coefficients to zero to reduce model complexity. There are many penalty-based methods with different benefits, drawbacks and statistical property. The choosing of these penalty-based methods under different situations has become a problem. So, in this paper, we mainly focus on the comparison and evaluation of four popular penalty-based methods by evaluating the three metrics which are accuracy, robustness and the robustness-performance trade-off (RPT) for each method. Since each of them has its statistical properties, our comparison may be helpful for researchers when making choices when dealing with high throughput data. The result shows that LASSO achieves the best robustness and accuracy among the four feature selection methods when dealing with high throughput TCGA datasets.
What problem does this paper attempt to address?