MEGAnno+: A Human-LLM Collaborative Annotation System

Hannah Kim,Kushan Mitra,Rafael Li Chen,Sajjadur Rahman,Dan Zhang
2024-02-28
Abstract:Large language models (LLMs) can label data faster and cheaper than humans for various NLP tasks. Despite their prowess, LLMs may fall short in understanding of complex, sociocultural, or domain-specific context, potentially leading to incorrect annotations. Therefore, we advocate a collaborative approach where humans and LLMs work together to produce reliable and high-quality labels. We present MEGAnno+, a human-LLM collaborative annotation system that offers effective LLM agent and annotation management, convenient and robust LLM annotation, and exploratory verification of LLM labels by humans.
Computation and Language,Human-Computer Interaction
What problem does this paper attempt to address?
The paper attempts to address the issue of how to combine the advantages of large language models (LLM) and human annotation in the data annotation process to improve annotation quality and efficiency. Specifically, the paper proposes a human-machine collaborative annotation system called MEGAnno+ that aims to solve the following key problems: 1. **Limitations of LLM annotation**: Although LLM can generate annotations quickly and at low cost, it may produce incorrect annotations when dealing with complex, socio-cultural, or domain-specific contexts. Therefore, human involvement is needed to verify and correct these annotations. 2. **Efficient human-machine collaboration process**: Existing annotation tools typically support either human annotation or rely entirely on LLM, lacking an effective human-machine collaboration mechanism. MEGAnno+ achieves an efficient annotation process by providing flexible LLM management and convenient human verification functions. 3. **Annotation management and reuse**: During the annotation process, users frequently need to adjust model configurations and prompt templates. MEGAnno+ supports the reuse and comparison of annotation tasks by storing and managing the LLM models and prompt templates that have been used. 4. **Quality assurance of annotated data**: To ensure the quality of annotated data, MEGAnno+ provides selective and exploratory verification functions, allowing users to filter and sort based on labels and metadata, thereby prioritizing the verification of suspicious annotation results. In summary, the main goal of this paper is to improve the quality and efficiency of data annotation through human-machine collaboration while addressing the limitations of LLM in the annotation process.