A neural network learning algorithm for highly imbalanced data classification

Zhan ao Huang,Yongsheng Sang,Yanan Sun,Jiancheng Lv
DOI: https://doi.org/10.1016/j.ins.2022.08.074
IF: 8.1
2022-01-01
Information Sciences
Abstract:The imbalanced data problem exists in many real-world datasets. Neural networks are one popular method for classifying imbalanced data. However, data imbalance often negatively affects neural networks. This problem is exacerbated when data are highly imbalanced. Existing neural network approaches for handling this problem heavily rely on rebalancing or reweighting known data. Essentially, these strategies focus on recovering the characteristics of balanced data. However, due to the serious lack of positive samples, the problem of insufficient empirical representation has not been thoroughly considered. Therefore, to solve the problem of highly imbalanced data, we explore the characteristics of the gradient norm in gradient descent optimization. We find that the key indicator of balanced data is that the gradient norms of positive and negative classes are approximately equal. Specifically, neural networks can classify known data that is highly imbalanced by considering the unit gradient direction of positive and negative classes. Furthermore, a local boundary expansion strategy is considered to alleviate the insufficient empirical representation problem of the positive class. Here, we propose a controllable gradient rotation strategy to realize local boundary expansion for positive samples. We validate the proposed approach on 34 highly imbalanced datasets and two synthetic datasets, and the proposed method exhibits impressive performance.
What problem does this paper attempt to address?