Unbalanced data processing using deep sparse learning technique

Xing Li,Lei Zhang
DOI: https://doi.org/10.1016/j.future.2021.05.034
IF: 7.307
2021-01-01
Future Generation Computer Systems
Abstract:In view of the limitations of existing unbalanced data modeling technique, a hybrid sampling algorithm encoding the boundary sparse samples is proposed. The boundary samples are recognized by calculating the boundary factors of each sample, based on which the sample space is divided into boundary and non-boundary domain. The negative samples in the non-boundary domain are undersampling. Owing to the sparsity of the samples in the boundary domain, the positive samples are oversampling by leveraging the composite minority oversampling technique using the maximum distance. This is used to retain the information of the positive samples to the maximum extent Taking recall, F1 value, g-mean and AUC values. Next, the evaluation indexes learn the support vector machines (SVM) algorithm in order to verify its effectiveness on five data sets with different balance degrees. Therefore, our method can effectively deal with unbalanced data. The algorithm first introduces the concept of support k-outlier degree to identify the boundary point set and non-boundary point set in the data set. It then leverages the improved smote algorithm to synthesize the new point set by taking the boundary points in a few classes as the target samples. It further calculates the distance by the sampling algorithm for the non-boundary points toward most classes in order to achieve the balance between classes. Compared with other four methods. The experimental results have shown that the proposed algorithm has good performance on different data sets, with an average increase of 3.5%. (C) 2021 Published by Elsevier B.V.
What problem does this paper attempt to address?