SNN-PDM: An Improved Probability Density Machine Algorithm Based on Shared Nearest Neighbors Clustering Technique

Shiqi Wu,Hualong Yu,Yan Gu,Changbin Shao,Shang Gao
DOI: https://doi.org/10.1007/s00357-024-09474-2
IF: 1.333
2024-05-18
Journal of Classification
Abstract:Probability density machine (PDM) is a novel algorithm which was proposed recently for addressing class imbalance learning (CIL) problem. PDM can capture priori data distribution information well and present robust performance in various CIL applications. However, we also note that the PDM is sensitive to CIL data with varying density and/or small disjunctions which means there are two or multiple obvious sub-clusters within the same class, as on this kind of data, the estimation of conditional probability might be extremely inaccurate. To address this problem, we introduce the shared nearest neighbors (SNN) clustering technique into PDM procedure and propose a novel SNN-PDM algorithm. Specifically, the SNN can adapt varying density and capture small disjunctions existing in data distribution well. We evaluate the proposed algorithm on a large amount of CIL datasets, and the results show that the proposed SNN-PDM algorithm outperforms the PDM and several previous methods. Meanwhile, in comparison with PDM, the SNN-PDM has less time consumption.
mathematics, interdisciplinary applications,psychology, mathematical
What problem does this paper attempt to address?