An improvement direction for filter selection techniques using information theory measures and quadratic optimization

Waad Bouaguel,Ghazi Bel Mufti
DOI: https://doi.org/10.48550/arXiv.1208.3689
2012-08-18
Abstract:Filter selection techniques are known for their simplicity and efficiency. However this kind of methods doesn't take into consideration the features inter-redundancy. Consequently the un-removed redundant features remain in the final classification model, giving lower generalization performance. In this paper we propose to use a mathematical optimization method that reduces inter-features redundancy and maximize relevance between each feature and the target variable.
Machine Learning,Information Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the redundant feature problem in the existing filtered feature selection methods. Specifically: 1. **Feature Redundancy Problem**: Although traditional filtered feature selection methods are simple and efficient, they usually do not consider the redundancy between features (i.e., the correlation between features). Therefore, the unremoved redundant features will be retained in the final classification model, resulting in lower generalization performance of the model. 2. **Improving Classification Performance**: In order to improve the performance of the classification model, a method that can reduce the redundancy between features and maximize the correlation between each feature and the target variable is required. To solve these problems, the author proposes a method based on information - theoretic measures and quadratic optimization, aiming to reduce the redundancy between features through mathematical optimization and simultaneously maximize the correlation between features and the target variable. Specifically, this method uses the following formulas to represent the optimization objective: - **Minimum Redundancy Condition**: \[ P_1=\min\frac{1}{|S|(|S| - 1)}\sum_{i,j\in S,i\neq j}M(i,j) \] where \(M(i,j)\) represents the similarity between features \(x_i\) and \(x_j\), and \(|S|\) is the number of features in the feature subset \(S\). - **Maximum Correlation Condition**: \[ P_2 = \max\frac{1}{|S|}\sum_{i\in S}M(x_i,y) \] where \(M(x_i,y)\) represents the similarity between feature \(x_i\) and the target variable \(y\). By combining these two conditions, a comprehensive objective function is formed: \[ f(x)=(1 - \alpha)\cdot P_1-\alpha\cdot P_2 \] where \(\alpha\in[0,1]\) is a balancing parameter used to adjust the trade - off between redundancy and correlation. This method solves the above - mentioned optimization problem through quadratic programming, thereby selecting the optimal feature subset to improve the performance and generalization ability of the classification model.