Abstract:We present MEGAnno, a novel exploratory annotation framework designed for NLP researchers and practitioners. Unlike existing labeling tools that focus on data labeling only, our framework aims to support a broader, iterative ML workflow including data exploration and model development. With MEGAnno's API, users can programmatically explore the data through sophisticated search and automated suggestion functions and incrementally update task schema as their project evolve. Combined with our widget, the users can interactively sort, filter, and assign labels to multiple items simultaneously in the same notebook where the rest of the NLP project resides. We demonstrate MEGAnno's flexible, exploratory, efficient, and seamless labeling experience through a sentiment analysis use case.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the limitations of existing data annotation tools in natural language processing (NLP) research and practice. Specifically, these problems include: 1. **The gap between ML tools**: Most existing tools are independently designed and focus on specific steps in the machine - learning process, causing researchers to frequently switch contexts and transfer data in their daily work. 2. **Lack of customization and fine - grained control**: Not all data points are equally important. Users may wish to prioritize certain batches of data (for example, for better category or domain coverage, or to focus on data points that downstream models cannot predict well). Although some active - learning - based tools can provide suggestions for the next batch of data, most tools do not provide customization and fine - grained control in combination with downstream models. 3. **Lack of support for project evolution**: Current annotation tools usually assume that data collection tasks are clearly defined and immutable, ignoring that annotation projects can evolve during the exploration process and making it difficult to apply these changes. To solve these problems, the authors propose **MEGAnno**, a flexible, exploratory, efficient, and seamless data annotation framework designed to support the iterative work - flow of NLP researchers and practitioners throughout the machine - learning life cycle. The main features of MEGAnno include: - **Seamless integration**: It supports data pre - processing, annotation, analysis, model development, and evaluation in the same Jupyter Notebook. - **Customizable interface**: Through rich heuristic searches, automatic suggestions, and active - learning - based suggestions for the next batch of data, it helps users guide the project in the desired direction. - **Support for project evolution**: It is designed with a flexible task mode and provides a built - in analysis dashboard to assist decision - making. Through these features, MEGAnno aims to bridge the gap between existing tools, provide a more flexible and efficient annotation experience, and support the continuous evolution of projects.

MEGAnno: Exploratory Labeling for NLP in Computational Notebooks

MEGAnno+: A Human-LLM Collaborative Annotation System

AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators

anndata: Annotated data

FullAnno: A Data Engine for Enhancing Image Comprehension of MLLMs

Textual Data Augmentation for NER in Geosciences with LLMs

Boosting LLMS with Ontology-Aware Prompt for Ner Data Augmentation

SciAnnotate: A Tool for Integrating Weak Labeling Sources for Sequence Labeling

Automated Annotation of Scientific Texts for ML-based Keyphrase Extraction and Validation

ActiveAnno3D -- An Active Learning Framework for Multi-Modal 3D Object Detection

EEVEE: An Easy Annotation Tool for Natural Language Processing

TopicTag: Automatic Annotation of NMF Topic Models Using Chain of Thought and Prompt Tuning with LLMs

Augmenting NER Datasets with LLMs: Towards Automated and Refined Annotation

CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation

Visualizing NLP annotations for Crowdsourcing

Semi-supervised Interactive Intent Labeling

LLMaAA: Making Large Language Models as Active Annotators

Learning to Predict Usage Options of Product Reviews with LLM-Generated Labels

When the pen is mightier than the sword: semi-automatic 2 and 3D image labelling

Towards a General-Purpose Linguistic Annotation Backend

GeoAnnotator: A Collaborative Semi-Automatic Platform for Constructing Geo-Annotated Text Corpora