Perturbation-Based Two-Stage Multi-Domain Active Learning

Rui He,Zeyu Dai,Shan He,Ke Tang
2023-06-19
Abstract:In multi-domain learning (MDL) scenarios, high labeling effort is required due to the complexity of collecting data from various domains. Active Learning (AL) presents an encouraging solution to this issue by annotating a smaller number of highly informative instances, thereby reducing the labeling effort. Previous research has relied on conventional AL strategies for MDL scenarios, which underutilize the domain-shared information of each instance during the selection procedure. To mitigate this issue, we propose a novel perturbation-based two-stage multi-domain active learning (P2S-MDAL) method incorporated into the well-regarded ASP-MTL model. Specifically, P2S-MDAL involves allocating budgets for domains and establishing regions for diversity selection, which are further used to select the most cross-domain influential samples in each region. A perturbation metric has been introduced to evaluate the robustness of the shared feature extractor of the model, facilitating the identification of potentially cross-domain influential samples. Experiments are conducted on three real-world datasets, encompassing both texts and images. The superior performance over conventional AL strategies shows the effectiveness of the proposed strategy. Additionally, an ablation study has been carried out to demonstrate the validity of each component. Finally, we outline several intriguing potential directions for future MDAL research, thus catalyzing the field's advancement.
Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of high annotation costs in the Multi - Domain Learning (MDL) scenario. Specifically, MDL requires collecting data from multiple domains, which makes the data collection process complex and time - consuming. To reduce the annotation workload, Active Learning (AL) offers a promising solution by selecting a small number of highly informative instances for annotation, thus reducing the annotation workload. However, traditional AL strategies have limitations when applied to MDL, mainly in the following aspects: 1. **Under - utilization of domain - shared information**: Traditional AL strategies fail to fully consider the domain - shared information of each instance during the selection process. 2. **Incomparable scores across different domains**: Scores from different domains are mixed together, which may lead to biased selection. To solve these problems, the authors propose a new Perturbation - Based Two - Stage Multi - Domain Active Learning method (P2S - MDAL). This method combines the well - known ASP - MTL model and improves the AL strategy in the following ways: - **First stage**: Allocate budgets according to the influence of domains and establish diversity selection areas. - **Second stage**: Use perturbation to evaluate cross - domain influence and select the most influential samples. Specifically, P2S - MDAL identifies potentially cross - domain - influential samples by introducing a perturbation metric to evaluate the robustness of the model - shared feature extractor. Experimental results show that P2S - MDAL significantly outperforms traditional AL strategies on three real - world datasets. In addition, the authors also conduct an ablation study to verify the effectiveness of each component and propose some interesting directions for future MDAL research. ### Summary The main contributions of this paper include: 1. Proposing the first AL strategy P2S - MDAL specifically designed for MDL. 2. Introducing a perturbation - based method to evaluate the cross - domain influence of instances. 3. Providing experimental evidence of its effectiveness on multiple actual datasets and pointing out future research directions. Through these improvements, P2S - MDAL can more effectively reduce annotation costs in multi - domain learning while maintaining or improving model performance.