Abstract:In this work we propose a pragmatic method that reduces the annotation cost for structured label spaces using active learning. Our approach leverages partial annotation, which reduces labeling costs for structured outputs by selecting only the most informative sub-structures for annotation. We also utilize self-training to incorporate the current model's automatic predictions as pseudo-labels for un-annotated sub-structures. A key challenge in effectively combining partial annotation with self-training to reduce annotation cost is determining which sub-structures to select to label. To address this challenge, we adopt an error estimator to adaptively decide the partial selection ratio according to the current model's capability. In evaluations spanning four structured prediction tasks, we show that our combination of partial annotation and self-training using an adaptive selection ratio reduces annotation cost over strong full annotation baselines under a fair comparison scheme that takes reading time into consideration.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to reduce the annotation cost in structured prediction tasks. Specifically, the author proposes a method that combines partial annotation (PA) and self - training. Through active learning (AL), the most informative sub - structures are selected for annotation, and the automatic predictions of the model are used as pseudo - labels for unannotated sub - structures, thereby reducing the annotation workload while ensuring the performance of the model. ### Main Problems 1. **Reducing Annotation Cost**: Structured prediction tasks usually require a large amount of annotated data, which is not only time - consuming but also costly. The author hopes to reduce the required amount of annotation by means of partial annotation and self - training while ensuring the performance of the model. 2. **Selecting Appropriate Sub - Structures**: In partial annotation, how to select the most informative sub - structures is a key issue. The author proposes an adaptive selection strategy based on an error estimator, which dynamically adjusts the selection ratio according to the current ability of the model. 3. **Effectively Utilizing Unannotated Data**: Through the self - training method, the prediction results of the model for unannotated data are used as additional training signals to further improve the performance of the model. ### Method Overview - **Partial Annotation (PA)**: Select the most uncertain sub - structures in the sentence for annotation instead of all the structures of the entire sentence. This reduces the annotation workload. - **Self - Training**: Use the prediction results of the model for unannotated data as pseudo - labels to enhance the training effect of the model. - **Adaptive Selection Strategy**: Dynamically determine the selection ratio of partial annotation through an error estimator to ensure that the selected sub - structures have the highest information content. ### Experimental Setup - **Tasks**: Named Entity Recognition (NER), Dependency Parsing (DPAR), Event Extraction and Relation Extraction. - **Datasets**: CoNLL - 2003 (NER), English Web Treebank (DPAR), ACE05 (Event Extraction and Relation Extraction). - **Evaluation Metrics**: Reading cost (measured by the total number of words in the sentence) and annotation cost (measured by the number of annotated sub - structures). ### Results - **NER**: Under the same reading cost, partial annotation (PA) can achieve performance comparable to full annotation (FA), but with a smaller number of annotated sub - structures. - **DPAR**: Partial annotation (PA) also maintains performance similar to full annotation (FA) while reducing the annotation cost. - **Adaptive Selection Strategy**: The adaptive selection strategy can dynamically adjust the selection ratio according to the current ability of the model, effectively reducing unnecessary annotation. ### Conclusion This paper successfully reduces the annotation cost in multiple structured prediction tasks while maintaining the performance of the model by combining the methods of partial annotation and self - training. The adaptive selection strategy and self - training method play a key role in this.

Data-efficient Active Learning for Structured Prediction with Partial Annotation and Self-Training

Human-centred Design on Crowdsourcing Annotation Towards Improving Active Learning Model Performance

Learning to Label with Active Learning and Reinforcement Learning.

ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation

Active Learning for Dependency Parsing with Partial Annotation.

Towards a SVM-struct Based Active Learning Algorithm for Least Cost Semantic Annotation

Active Partial Label Learning Based on Adaptive Sample Selection

Learning Multiple Dense Prediction Tasks from Partially Annotated Data

Active Learning for NLP with Large Language Models

Optimizing Active Learning for Low Annotation Budgets

Cost-Effective Active Learning from Diverse Labelers.

Effective Active Learning Strategies for the Use of Large-Margin Classifiers in Semantic Annotation: an Optimal Parameter Discovery Perspective.

ActiveLab: Active Learning with Re-Labeling by Multiple Annotators

Active Learning with Label Quality Control

Just Label What You Need: Fine-Grained Active Selection for Perception and Prediction through Partially Labeled Scenes

Selective Annotation Makes Language Models Better Few-Shot Learners

LEAF: A Less Expert Annotation Framework with Active Learning

Online Distributed Passive-Aggressive Algorithm For Structured Learning

Robust Assignment of Labels for Active Learning with Sparse and Noisy Annotations

Synergistic Training: Harnessing Active Learning and Pseudo-Labeling for Enhanced Model Performance in Deep Learning

Data : Labeler 1 : Labeler 2 : Labeler 3 : Figure