Abstract:This paper studies Positive and Unlabeled learning (PU learning), of which the target is to build a binary classifier where only positive data and unlabeled data are available for classifier training. To deal with the absence of negative training data, we first regard all unlabeled data as negative examples with false negative labels, and then convert PU learning into the risk minimization problem in the presence of such one-side label noise. Specifically, we propose a novel PU learning algorithm dubbed "Loss Decomposition and Centroid Estimation" (LDCE). By decomposing the loss function of corrupted negative examples into two parts, we show that only the second part is affected by the noisy labels. Thereby, we may estimate the centroid of corrupted negative set via an unbiased way to reduce the adverse impact of such label noise. Furthermore, we propose the "Kernelized LDCE" (KLDCE) by introducing the kernel trick, and show that KLDCE can be easily solved by combining Alternative Convex Search (ACS) and Sequential Minimal Optimization (SMO). Theoretically, we derive the generalization error bound which suggests that the generalization risk of our model converges to the empirical risk with the order of $mathcal {O}(1/sqrt{k}+1/sqrt{n-k}+1/sqrt{n})$<math>O(1/k+1/n-k+1/n)</math> ($n$<math>n</math> and $k$<math>k</math> are the amounts of training data and positive data correspondingly). Experimentally, we conduct intensive experiments on synthetic dataset, UCI benchmark datasets and real-world datasets, and the results demonstrate that our approaches (LDCE and KLDCE) achieve the top-level performance when compared with both classic and state-of-the-art PU learning methods.

F BGD : Learning Embeddings from Positive Unlabeled Data with BGD.

$f_{BGD}$: Learning Embeddings From Positive-Only Data with BGD

UniGrad-FS: Unified Gradient Projection with Flatter Sharpness for Continual Learning

GradPU: Positive-Unlabeled Learning via Gradient Penalty and Positive Upweighting

Balanced Gradient Penalty Improves Deep Long-Tailed Learning

Efficient Training for Positive Unlabeled Learning

Inverse-Free Fast Natural Gradient Descent Method for Deep Learning

Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning

A boosting framework for positive-unlabeled learning

A Boosting Algorithm for Positive-Unlabeled Learning

Neighbor Does Matter: Curriculum Global Positive-Negative Sampling for Vision-Language Pre-training

Self-PU: Self Boosted and Calibrated Positive-Unlabeled Training

Positive-Unlabeled Learning by Latent Group-Aware Meta Disambiguation

Positive Unlabeled Contrastive Learning

Mini-batch Gradient Descent with Buffer

Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning

Perturbated Gradients Updating within Unit Space for Deep Learning

Instance-Dependent PU Learning by Bayesian Optimal Relabeling

Graph-based Boosting Algorithm to Learn Labeled and Unlabeled Data

Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)

Positive and Unlabeled Learning with Label Disambiguation