LLMs in the Loop: Leveraging Large Language Model Annotations for Active Learning in Low-Resource Languages

Nataliia Kholodna,Sahib Julka,Mohammad Khodadadi,Muhammed Nurullah Gumus,Michael Granitzer

2024-06-24

Abstract:Low-resource languages face significant barriers in AI development due to limited linguistic resources and expertise for data labeling, rendering them rare and costly. The scarcity of data and the absence of preexisting tools exacerbate these challenges, especially since these languages may not be adequately represented in various NLP datasets. To address this gap, we propose leveraging the potential of LLMs in the active learning loop for data annotation. Initially, we conduct evaluations to assess inter-annotator agreement and consistency, facilitating the selection of a suitable LLM annotator. The chosen annotator is then integrated into a training loop for a classifier using an active learning paradigm, minimizing the amount of queried data required. Empirical evaluations, notably employing GPT-4-Turbo, demonstrate near-state-of-the-art performance with significantly reduced data requirements, as indicated by estimated potential cost savings of at least 42.45 times compared to human annotation. Our proposed solution shows promising potential to substantially reduce both the monetary and computational costs associated with automation in low-resource settings. By bridging the gap between low-resource languages and AI, this approach fosters broader inclusion and shows the potential to enable automation across diverse linguistic landscapes.

Computation and Language,Artificial Intelligence,Information Retrieval,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the high cost and resource consumption of data annotation in low - resource languages. Specifically, due to the scarcity of data in low - resource languages and the high cost of expert annotation, these languages face significant obstacles in AI development. The paper proposes to utilize the potential of large language models (LLMs) in the active learning loop to reduce the amount of data required for queries, thereby reducing the cost of data annotation and the demand for computational resources. Through experimental evaluation, the paper shows that using LLMs such as GPT - 4 - Turbo can achieve performance close to the state - of - the - art, and compared with manual annotation, the potential cost savings can be at least 42.45 times. This indicates that by integrating LLMs into the active learning framework, the AI development of low - resource languages can be effectively promoted, inclusiveness can be expanded, and the application of automation technologies in different language environments can be promoted.

LLMs in the Loop: Leveraging Large Language Model Annotations for Active Learning in Low-Resource Languages

LLMaAA: Making Large Language Models as Active Annotators

Active Learning for NLP with Large Language Models

Enhancing Text Classification through LLM-Driven Active Learning and Human Annotation

Large Language Models for Data Annotation: A Survey

Can LLMs Augment Low-Resource Reading Comprehension Datasets? Opportunities and Challenges

FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models

Large Language Models as Annotators: Enhancing Generalization of NLP Models at Minimal Cost

LLMs Accelerate Annotation for Medical Information Extraction

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation with LLMs

Large Language Models for Data Annotation and Synthesis: A Survey

Interactive Multi-fidelity Learning for Cost-effective Adaptation of Language Model with Sparse Human Supervision

Human still wins over llm: An empirical study of active learning on domain-specific annotation tasks

Automatically Generating CS Learning Materials with Large Language Models

Human-LLM Collaborative Annotation Through Effective Verification of LLM Labels

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

Prototyping the use of Large Language Models (LLMs) for adult learning content creation at scale

Large Language Models as Financial Data Annotators: A Study on Effectiveness and Efficiency