A boosting framework for positive-unlabeled learning
Yawen Zhao,Mingzhe Zhang,Chenhao Zhang,Weitong Chen,Nan Ye,Miao Xu
DOI: https://doi.org/10.1007/s11222-024-10529-y
IF: 2.3241
2024-11-23
Statistics and Computing
Abstract:Positive-unlabeled (PU) learning deals with binary classification problems where only positive and unlabeled data are available. In this paper, we introduce a novel boosting framework, Adaptive PU (AdaPU), for learning from PU data. AdaPU builds an ensemble of weak classifiers using weak learners tailored to PU data. We propose two main approaches for learning the weak classifiers: a direct loss minimization approach that learns weak classifiers to greedily minimize PU-data-based estimates of the exponential loss, specifically, the unbiased PU estimate and the non-negative PU estimate; and a constrained loss minimization approach that learns weak classifiers to greedily minimize the unbiased PU estimate of the exponential loss, subject to regularization constraints. The direct loss minimization approach, while natural and simple, often yields weak learners prone to overfitting or leads to computationally expensive algorithms. On the other hand, the constrained loss minimization approach can effectively alleviate overfitting and allow the design of efficient weak learners. In particular, we propose a tailored weak learner for the simple class of decision stumps, or one-level decision trees, which interestingly demonstrates strong performance in comparison to various other weak classifiers. Furthermore, we provide several theoretical results on the performance of AdaPU. We performed extensive experiments to evaluate the variants of AdaPU and various baseline algorithms. Our results demonstrate the effectiveness of the constrained loss minimization approach.
statistics & probability,computer science, theory & methods