Semi-Automatic Dataset Annotation Applied to Automatic Violent Message Detection

Beatriz Botella-Gil,Robiert Sepúlveda-Torres,Alba Bonet-Jover,Patricio Martínez-Barco,Estela Saquete
DOI: https://doi.org/10.1109/access.2024.3361404
IF: 3.9
2024-02-10
IEEE Access
Abstract:Annotated corpora are indispensable tools to train computational models in Artificial Intelligence and Natural Language Processing. However, manual annotation is a costly, arduous, and time-consuming task, especially when the annotation is semantically complex. To address the problem, this work applies a methodology for semi-automatic annotation of datasets based on the Human-in-the-Loop paradigm. The methodology supports the building of a resource, that benefits from a fine-grained annotation, to aid in the detection of Spanish violent messages sourced from social media (Twitter/X). After implementing the proposed methodology for semi-automatic violence annotation, a high quality resource was obtained (hereafter referred to as VILLANOS). The methodology consists of annotating the dataset incrementally, which delivers an increase in annotator efficiency, thereby validating the suitability of the proposal. Annotation time was reduced by 52% compared to manual annotation and performance, by training a model with the VILLANOS dataset, obtains an of 85.2%. These results demonstrate the efficiency and effectiveness of the methodology, evidencing its validity.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?