A hybrid feature selection algorithm applied to high-dimensional imbalanced small-sample data classification

Fang Feng, Qingquan Lv, Mingsong Wang, Xuhui Yang, Qingguo Zhou, Rui Zhou
2019-11-18
Abstract:With the rapid development of microarray technology and interdisciplinary science, it is possible for microarray technology to be used to predict diseases. Microarray technology has the advantages of high speed, high efficiency and reliability in disease prediction. However, microarray data are usually high-dimensional with small samples, additionally, the samples are often imbalanced, which brings a lot of difficulties to researchers. In view of the above problems, it is proposed in this paper a Filter-Wrapper hybrid feature selection algorithm Union Information Gini Cost-sensitive Feature Selection General Vector Machine (UIG-CFGVM) to tackle the high-dimensional imbalanced small-sample problem. The improved hybrid algorithm is as follows: Firstly, the most common features are removed by the proposed hybrid filter algorithm UIG, which is obtained by Information Gain (Info)and Gini Index (Gini). Secondly, Cost …
What problem does this paper attempt to address?