Minority Sub-Region Estimation-Based Oversampling for Imbalance Learning

Yi Sun,Lijun Cai,Bo Liao,Wen Zhu
DOI: https://doi.org/10.1109/tkde.2020.3010013
IF: 9.235
2022-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Class imbalance problem that characterized with the skew distribution towards the majority arises as one challenge in recent years. Many oversampling techniques have been proposed to cope with this problem and some of them combine the oversampling procedure with the clustering algorithm which guaranteeing new synthetic samples being generated in clusters. However far-away samples but with the same minority sub-region are generally clustered into different groups owing to the characteristic of clustering algorithm itself. Therefore, the following oversampling procedure is mostly carried in incomplete minority sub-regions that synthetic samples not well cover the integral minority region. And to our best knowledge, none of existing algorithm is designed to directly estimate minority sub-regions for class imbalance problem. Thus, one new grouping algorithm, named Direction Distribution-based Minority Sub-region Estimation (DDMSE), is first proposed. The new algorithm exploits the intuitive observation, that the minority with the same sub-region almost distribute within the same direction when compared to other majority, to estimate minority sub-regions that tactfully ignoring negative impacts brought by the distance factor like in clustering algorithms. Finally, new synthetic samples are generated in those minority sub-regions. And experimental results on real-world datasets show the comparable performance with other state-of-the-art oversampling methods.
What problem does this paper attempt to address?