Abstract:Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature and has attracted much attention in recent years. One common approach in PU learning is to sample a set of pseudo-negatives from the unlabeled data using ad-hoc thresholds so that conventional supervised methods can be applied with both positive and negative samples. Owing to the label uncertainty among the unlabeled data, errors of misclassifying unlabeled positive samples as negative samples inevitably appear and may even accumulate during the training processes. Those errors often lead to performance degradation and model instability. To mitigate the impact of label uncertainty and improve the robustness of learning with positive and unlabeled data, we propose a new robust PU learning method with a training strategy motivated by the nature of human learning: easy cases should be learned first. Similar intuition has been utilized in curriculum learning to only use easier cases in the early stage of training before introducing more complex cases. Specifically, we utilize a novel ``hardness'' measure to distinguish unlabeled samples with a high chance of being negative from unlabeled samples with large label noise. An iterative training strategy is then implemented to fine-tune the selection of negative samples during the training process in an iterative manner to include more ``easy'' samples in the early stage of training. Extensive experimental validations over a wide range of learning tasks show that this approach can effectively improve the accuracy and stability of learning with positive and unlabeled data. Our code is available at <a class="link-external link-https" href="https://github.com/woriazzc/Robust-PU" rel="external noopener nofollow">this https URL</a>

Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets

Split-PU: Hardness-aware Training Strategy for Positive-Unlabeled Learning

Self-PU: Self Boosted and Calibrated Positive-Unlabeled Training

Efficient Training for Positive Unlabeled Learning

Uncertainty-aware Pseudo-label Selection for Positive-Unlabeled Learning

A boosting framework for positive-unlabeled learning

PUe: Biased Positive-Unlabeled Learning Enhancement by Causal Inference

Meta-learning for Positive-unlabeled Classification

A Boosting Algorithm for Positive-Unlabeled Learning

Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction

Improving Positive Unlabeled Learning: Practical AUL Estimation and New Training Method for Extremely Imbalanced Data Sets

PSPU: Enhanced Positive and Unlabeled Learning by Leveraging Pseudo Supervision

Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning

False Positive Rate Control for Positive Unlabeled Learning

Contrastive Approach to Prior Free Positive Unlabeled Learning

Document Set Expansion with Positive-Unlabelled Learning Using Intractable Density Estimation

Positive Unlabeled Contrastive Learning

A Variational Approach for Learning from Positive and Unlabeled Data

Positive-Unlabeled Learning by Latent Group-Aware Meta Disambiguation

Community-Based Hierarchical Positive-Unlabeled (PU) Model Fusion for Chronic Disease Prediction

Large-Margin Label-Calibrated Support Vector Machines for Positive and Unlabeled Learning