Positive-Unlabeled Domain Adaptation

Jonas Sonntag,Gunnar Behrens,Lars Schmidt-Thieme
DOI: https://doi.org/10.48550/arXiv.2202.05695
2022-02-11
Abstract:Domain Adaptation methodologies have shown to effectively generalize from a labeled source domain to a label scarce target domain. Previous research has either focused on unlabeled domain adaptation without any target supervision or semi-supervised domain adaptation with few labeled target examples per class. On the other hand Positive-Unlabeled (PU-) Learning has attracted increasing interest in the weakly supervised learning literature since in quite some real world applications positive labels are much easier to obtain than negative ones. In this work we are the first to introduce the challenge of Positive-Unlabeled Domain Adaptation where we aim to generalise from a fully labeled source domain to a target domain where only positive and unlabeled data is available. We present a novel two-step learning approach to this problem by firstly identifying reliable positive and negative pseudo-labels in the target domain guided by source domain labels and a positive-unlabeled risk estimator. This enables us to use a standard classifier on the target domain in a second step. We validate our approach by running experiments on benchmark datasets for visual object recognition. Furthermore we propose real world examples for our setting and validate our superior performance on parking occupancy data.
Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of **Positive - Unlabeled Domain Adaptation (PU - DA)**. Specifically, it attempts to perform domain adaptation in the following situations: 1. **Source Domain**: A fully - labeled dataset, that is, each sample has a label. 2. **Target Domain**: Only positive - class labels and unlabeled data. This means that there are no negative - class labels in the target domain. #### Background and Challenges In practical applications, obtaining a large amount of labeled data is a major challenge in machine learning, especially when deep - learning methods rely on a large amount of labeled training data. To solve this problem, researchers have proposed several different methods, such as: - **Unsupervised Domain Adaptation (UDA)**: The target domain has no labels at all. - **Semi - Supervised Domain Adaptation (SSDA)**: There is a small amount of labeled data in the target domain. - **Positive - Unlabeled Learning (PU - Learning)**: Only positive - class labels and unlabeled data are in the target domain. However, previous studies did not combine PU - Learning with domain adaptation. This paper first proposes and solves the problem of **Positive - Unlabeled Domain Adaptation**, that is, how to generalize from a fully - labeled source domain to a target domain with only positive - class labels and unlabeled data. #### Research Contributions 1. **Introducing a new problem setting**: For the first time, Positive - Unlabeled Domain Adaptation is defined, and a new learning framework is proposed to solve this problem. 2. **Verifying the performance of existing models**: The performance of existing domain - adaptation and PU - Learning models in this new setting is evaluated. 3. **Proposing a two - step learning method**: - **Step 1**: Based on the source - domain labels and the positive - class labels in the target domain, identify reliable positive - class and negative - class pseudo - labels in the target domain. - **Step 2**: Use these pseudo - labels to train a standard classifier on the target domain. 4. **Experimental proof**: The effectiveness of this method is verified through multiple benchmark datasets (such as Office - Home, Office - Caltech, MNIST - USPS), and its superior performance in real - world applications, such as parking - lot - occupancy prediction, is demonstrated. ### Summary The main goal of this paper is to solve the problem of how to perform effective domain adaptation from a fully - labeled source domain when there are only positive - class labels and unlabeled data in the target domain. By introducing a new two - step learning method, the author has successfully improved the classification performance on multiple benchmark datasets and demonstrated the potential of this method in practical applications.