A Grouped Structure-based Regularized Regression Model for Text Categorization

Wenbin Zheng,Yuntao Qian,Minchao Ye
DOI: https://doi.org/10.4304/jsw.7.9.2119-2124
2012-01-01
Journal of Software
Abstract:The lasso regularization has successfully been used in regression models for feature selection; however, lasso considers all variable to be independent and noncorrelative, which will yield an excessively sparse solution (i.e., some important discriminating features might be discarded) if the features are highly correlated. This paper proposes a novel approach in which a sparse model was developed for text categorization. We firstly constructed a grouped structure according the correlation of text features, and then embedded the structure into a regression model via a between- and within- group sparse manner. The goal of such manner is that the groups containing many discriminating features can be selected even the features in these groups are highly correlated, and the noise within the selected groups could be discarded simultaneously, which is beneficial for classification. The experimental results show that the proposed method achieves a good tradeoff between performance and sparsity on three benchmark data sets.
What problem does this paper attempt to address?