A graph partitioning-based hybrid feature selection method in microarray datasets
Abdelali Oubaouzine,Tayeb Ouaderhman,Hasna Chamlal
DOI: https://doi.org/10.1007/s10115-024-02292-3
IF: 2.7
2024-12-10
Knowledge and Information Systems
Abstract:Feature selection depicts one of the foremost methodologies in dimensionality reduction, with its primary objective being the extraction of pertinent features from an extensive dataset. Its process is driven by two principal objectives: reducing the feature count while simultaneously enhancing classification performance. Furthermore, graph mining techniques enrich the feature selection path by uncovering hidden correlations and facilitating the discovery of the more informative and efficient variables, ultimately improving the classification accuracy of machine learning models. Our research paper introduces an innovative hybrid approach based on graph partitioning and TOPSIS to identify relevant features in high-dimensional datasets. The suggested algorithm, known as Mutual Information Decomposition based on TOPSIS (MIDBT algorithm), can be described in two steps. The initial step, filter phase, aims to eliminate features that do not align with the target. At this stage, we apply the mrMR filter approach to screen out the K-best features from a given dataset, allowing the user to define the value of K . As a second step, we propose an innovative procedure based on graph mining, introducing a new weight between vertices that combines relevance and redundancy. To pinpoint the optimal subset of features, we conceptualize the task as a multi-criteria decision problem by the help of the TOPSIS method, which comprises two objectives to maximize and one objective to minimize simultaneously: modularity, accuracy and diameter. In the final step, applying forward selection algorithm to the optimal subset results in the best features with high classification performance and fewest variables possible. To substantiate and evaluate the efficiency of the hybrid approach (MIDBT algorithm), we benchmark our methodology against several advanced feature selection techniques, using 10 datasets characterized by high dimensionality. The effectiveness of the MIDBT method is validated by experimental results.
computer science, information systems, artificial intelligence