Abstract:In this project, we have investigated the use of advanced machine learning methods, specifically fine-tuned large language models, for pre-annotating data for a lexical extension task, namely adding descriptive words (verbs) to an existing (but incomplete, as of yet) ontology of event types. Several research questions have been focused on, from the investigation of a possible heuristics to provide at least hints to annotators which verbs to include and which are outside the current version of the ontology, to the possible use of the automatic scores to help the annotators to be more efficient in finding a threshold for identifying verbs that cannot be assigned to any existing class and therefore they are to be used as seeds for a new class. We have also carefully examined the correlation of the automatic scores with the human annotation. While the correlation turned out to be strong, its influence on the annotation proper is modest due to its near linearity, even though the mere fact of such pre-annotation leads to relatively short annotation times.

What problem does this paper attempt to address?

The paper primarily explores the issue of extending event type ontologies using large language models (LLMs), specifically including the addition of descriptive vocabulary (verbs) and new categories. The main objectives and contributions of the research can be summarized as follows: 1. **Research Background**: Faced with the highly dimensional and data-intensive task of ontology annotation, especially in specialized fields such as linguistics or medicine that require expert involvement, manual annotation is both expensive and time-consuming. Therefore, researchers aim to improve annotation efficiency through automated methods. 2. **Research Subject**: The paper takes the SynSemClass 4.0 ontology as an example, which is an event type ontology containing synonymous verbs in multiple languages. The goal of the research is to add new verbs and categories on the existing basis. 3. **Research Methods**: - Utilize fine-tuned large-scale language models to generate annotation suggestions. - Analyze verbs with low classification scores provided by the model as candidates for new categories. - Analyze high-confidence decisions to quickly confirm verbs highly related to specific categories. 4. **Experimental Design**: - Use the RemBERT model for fine-tuning and handle each category using a binary classifier approach. - Validate the effectiveness of the model's suggestions through manual annotation. - Compare annotation efficiency with and without providing automatic scores. 5. **Main Findings**: - Manual annotation consistency is high, but Cohen's κ value is low, mainly because most suggestions were rejected. - There is a strong positive correlation between automatic scores and manual annotations, indicating that the model can well predict the strength of the association between verbs and categories. - Providing automatic scores did not significantly improve annotation efficiency, but the entire pre-classification process allowed the annotation of thousands of unassigned verbs to be completed within a reasonable time frame. In summary, the paper aims to explore how to optimize the expansion of professional domain ontologies using machine learning techniques, particularly by improving the efficiency and accuracy of manual annotation through automated means. Although automatic scoring did not directly increase annotation speed, it provided valuable reference information for annotators, helping to accelerate the overall annotation process.

Extending an Event-type Ontology: Adding Verbs and Classes Using Fine-tuned LLMs Suggestions

OntoType: Ontology-Guided and Pre-Trained Language Model Assisted Fine-Grained Entity Typing

Fine-tuning Large Enterprise Language Models via Ontological Reasoning

A Hybrid Approach for Extending Ontology from Text

End-to-End Ontology Learning with Large Language Models

Plausible-Parrots @ MSP2023: Enhancing Semantic Plausibility Modeling using Entity and Event Knowledge

LLMs4OL: Large Language Models for Ontology Learning

Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field

Language Model Analysis for Ontology Subsumption Inference

Augmenting NER Datasets with LLMs: Towards Automated and Refined Annotation

Do LLMs Really Adapt to Domains? An Ontology Learning Perspective

Componential Analysis of English Verbs

A Modality Lexicon and its use in Automatic Tagging

CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation

Large Language Models for Data Annotation: A Survey

The effectiveness of Large Language Models with RAG for auto-annotating phenotype descriptions

Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding Datasets

EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language Models

Improving Large Language Models in Event Relation Logical Prediction

EvEval: A Comprehensive Evaluation of Event Semantics for Large Language Models

Evaluating end-to-end entity linking on domain-specific knowledge bases: Learning about ancient technologies from museum collections