MEGAnno+: A Human-LLM Collaborative Annotation System

Hannah Kim,Kushan Mitra,Rafael Li Chen,Sajjadur Rahman,Dan Zhang

2024-02-28

Abstract:Large language models (LLMs) can label data faster and cheaper than humans for various NLP tasks. Despite their prowess, LLMs may fall short in understanding of complex, sociocultural, or domain-specific context, potentially leading to incorrect annotations. Therefore, we advocate a collaborative approach where humans and LLMs work together to produce reliable and high-quality labels. We present MEGAnno+, a human-LLM collaborative annotation system that offers effective LLM agent and annotation management, convenient and robust LLM annotation, and exploratory verification of LLM labels by humans.

Computation and Language,Human-Computer Interaction

What problem does this paper attempt to address?

The paper attempts to address the issue of how to combine the advantages of large language models (LLM) and human annotation in the data annotation process to improve annotation quality and efficiency. Specifically, the paper proposes a human-machine collaborative annotation system called MEGAnno+ that aims to solve the following key problems: 1. **Limitations of LLM annotation**: Although LLM can generate annotations quickly and at low cost, it may produce incorrect annotations when dealing with complex, socio-cultural, or domain-specific contexts. Therefore, human involvement is needed to verify and correct these annotations. 2. **Efficient human-machine collaboration process**: Existing annotation tools typically support either human annotation or rely entirely on LLM, lacking an effective human-machine collaboration mechanism. MEGAnno+ achieves an efficient annotation process by providing flexible LLM management and convenient human verification functions. 3. **Annotation management and reuse**: During the annotation process, users frequently need to adjust model configurations and prompt templates. MEGAnno+ supports the reuse and comparison of annotation tasks by storing and managing the LLM models and prompt templates that have been used. 4. **Quality assurance of annotated data**: To ensure the quality of annotated data, MEGAnno+ provides selective and exploratory verification functions, allowing users to filter and sort based on labels and metadata, thereby prioritizing the verification of suspicious annotation results. In summary, the main goal of this paper is to improve the quality and efficiency of data annotation through human-machine collaboration while addressing the limitations of LLM in the annotation process.

MEGAnno+: A Human-LLM Collaborative Annotation System

Human-LLM Collaborative Annotation Through Effective Verification of LLM Labels

AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators

LLMaAA: Making Large Language Models as Active Annotators

MEGAnno: Exploratory Labeling for NLP in Computational Notebooks

CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation

FullAnno: A Data Engine for Enhancing Image Comprehension of MLLMs

Large Language Models for Data Annotation: A Survey

Augmenting NER Datasets with LLMs: Towards Automated and Refined Annotation

Large Language Models for Data Annotation and Synthesis: A Survey

Entity Alignment with Noisy Annotations from Large Language Models

LLMs Accelerate Annotation for Medical Information Extraction

LLM Chain Ensembles for Scalable and Accurate Data Annotation

From Human Annotation to LLMs: SILICON Annotation Workflow for Management Research

The Effectiveness of LLMs as Annotators: A Comparative Overview and Empirical Analysis of Direct Representation

LLMs in the Loop: Leveraging Large Language Model Annotations for Active Learning in Low-Resource Languages

Keeping Humans in the Loop: Human-Centered Automated Annotation with Generative AI

Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation with LLMs

Under the Surface: Tracking the Artifactuality of LLM-Generated Data

Can LLMs Replace Manual Annotation of Software Engineering Artifacts?

Making Large Language Models Better Data Creators