Intuitionistic Fuzzy Universum Twin Support Vector Machine for Imbalanced Data

A. Quadir,M. Tanveer
2024-10-27
Abstract:One of the major difficulties in machine learning methods is categorizing datasets that are imbalanced. This problem may lead to biased models, where the training process is dominated by the majority class, resulting in inadequate representation of the minority class. Universum twin support vector machine (UTSVM) produces a biased model towards the majority class, as a result, its performance on the minority class is often poor as it might be mistakenly classified as noise. Moreover, UTSVM is not proficient in handling datasets that contain outliers and noises. Inspired by the concept of incorporating prior information about the data and employing an intuitionistic fuzzy membership scheme, we propose intuitionistic fuzzy universum twin support vector machines for imbalanced data (IFUTSVM-ID). We use an intuitionistic fuzzy membership scheme to mitigate the impact of noise and outliers. Moreover, to tackle the problem of imbalanced class distribution, data oversampling and undersampling methods are utilized. Prior knowledge about the data is provided by universum data. This leads to better generalization performance. UTSVM is susceptible to overfitting risks due to the omission of the structural risk minimization (SRM) principle in their primal formulations. However, the proposed IFUTSVM-ID model incorporates the SRM principle through the incorporation of regularization terms, effectively addressing the issue of overfitting. We conduct a comprehensive evaluation of the proposed IFUTSVM-ID model on benchmark datasets from KEEL and compare it with existing baseline models. Furthermore, to assess the effectiveness of the proposed IFUTSVM-ID model in diagnosing Alzheimer's disease (AD), we applied them to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. Experimental results showcase the superiority of the proposed IFUTSVM-ID models compared to the baseline models.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is that when classifying on imbalanced data sets, traditional machine - learning methods (such as support vector machines and twin support vector machines) are prone to be biased towards the majority class, resulting in the misclassification of minority - class samples. Specifically: 1. **Classification problem of imbalanced data sets**: In imbalanced data sets, the number of majority - class samples is far greater than that of minority - class samples. Traditional machine - learning models are easily influenced by majority - class samples during the training process, thus having poor classification performance for minority - class samples. 2. **Influence of noise and outliers**: Traditional models perform poorly when dealing with data sets containing noise and outliers, which may lead to over - fitting of the model or a decline in generalization ability. 3. **Lack of the principle of structural risk minimization**: Some traditional models (such as UTSVM) do not include the principle of structural risk minimization (SRM) in their original formulas, which makes them prone to over - fitting. To solve the above problems, the author proposes a new model - Intuitionistic Fuzzy Universum Twin Support Vector Machine (IFUTSVM - ID). This model mitigates the influence of noise and outliers by introducing an intuitionistic fuzzy membership scheme and combines Universum data to provide prior information, so as to improve the generalization performance of the model. In addition, IFUTSVM - ID also realizes structural risk minimization by introducing a regularization term, thereby effectively dealing with the over - fitting problem. ### Main contributions: - Propose a new Intuitionistic Fuzzy Universum Twin Support Vector Machine (IFUTSVM - ID), which mitigates the influence of noise and outliers through intuitionistic fuzzy membership weights. - Apply under - sampling techniques in the constraint conditions and kernel matrices to ensure that the two types of samples have equal weights when constructing the classifier, while reducing the computational complexity. - Incorporate the principle of structural risk minimization into the original formula of the model by introducing a regularization term, which improves the generalization ability and robustness of the model. - Conduct experimental verification on multiple real - world data sets, and the results show that the IFUTSVM - ID model is superior to the existing baseline models. - Apply IFUTSVM - ID to the diagnosis of Alzheimer's disease, and the experimental results show that it is superior to other models in terms of accuracy. Through these improvements, IFUTSVM - ID can achieve better classification performance on imbalanced data sets and show stronger robustness in the presence of noise and outliers.