Abstract:We address the challenges inherent in converting natural language (NL) requirements into machine-readable formats by investigating the application of named-entity recognition (NER) within the aerospace domain. Recognizing the necessity for domain-specific language models, we developed an open-source annotated aerospace corpus and fine-tuned different versions of the BERT language model on the corpus to create aeroBERT-NER: a new model for identifying named entities (NEs) in the aerospace domain. A comparison between aeroBERT-NER and [Formula: see text]-NER demonstrated the superior performance of aeroBERT-NER in identifying NEs within a set of aerospace requirements. The identified NEs contribute to the development of a glossary, promoting consistent terminology usage in aerospace requirements and addressing challenges associated with the standardization of NL requirements.

What problem does this paper attempt to address?

This paper focuses on how to convert natural language (NL) requirements into machine-readable formats in the aerospace field. The researchers developed a domain-specific language model called aeroBERT-NER, which utilizes the BERT model and is fine-tuned on the aerospace corpus to identify named entities (NEs) in the aerospace domain. The purpose of this model is to improve the consistency of terms in aerospace requirements and promote standardized natural language requirements. The paper points out that errors in requirements engineering can result in high costs and system design failures. Currently, the industry is moving towards model-based approaches, but the ambiguity and inconsistency of natural language requirements hinder this process. Therefore, the researchers created an open-source aerospace named entity recognition dataset and fine-tuned the pre-trained BERT model to adapt to specific terms in the aerospace domain. With aeroBERT-NER, a vocabulary can be automatically created, enhancing communication between different stakeholders, improving requirement quality, and reducing ambiguity in natural language aerospace requirements. The main contributions of the paper include: 1. Creation of the first open-source aerospace named entity recognition dataset. 2. Demonstration of methods for collecting, cleaning, and annotating aerospace named entities from regulations, publications, etc. 3. Fine-tuning the BERT model to identify named entities in the aerospace domain (aeroBERT-NER), even with small-scale annotated datasets. 4. Showing the potential of large-scale language models in identifying named entities in the aerospace domain. The research also highlights current issues such as a lack of industrial case studies, open-source requirement datasets, and limited application of advanced natural language processing techniques in the aerospace domain. The paper concludes with the implementation and results of the proposed method and discusses possible future research directions.

Development of a Language Model for Named-Entity-Recognition in Aerospace Requirements

aeroBERT-Classifier: Classification of Aerospace Requirements Using BERT

Aviation-BERT-NER: Named Entity Recognition for Aviation Safety Reports

A review: development of named entity recognition (NER) technology for aeronautical information intelligence

Agile Methodology for the Standardization of Engineering Requirements Using Large Language Models

Comprehensive Overview of Named Entity Recognition: Models, Domain-Specific Applications and Challenges

DistALANER: Distantly Supervised Active Learning Augmented Named Entity Recognition in the Open Source Software Ecosystem

Neural Named Entity Recognition from Subword Units

Enhancing Low Resource NER Using Assisting Language And Transfer Learning

Textual Data Augmentation for NER in Geosciences with LLMs

NanoNER: Named Entity Recognition for nanobiology using experts' knowledge and distant supervision

Cascaded Models for Better Fine-Grained Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Improving Few-Shot Cross-Domain Named Entity Recognition by Instruction Tuning a Word-Embedding based Retrieval Augmented Large Language Model

SlugNERDS: A Named Entity Recognition Tool for Open Domain Dialogue Systems

A novel Data and Model Centric artificial intelligence based approach in developing high-performance Named Entity Recognition for Bengali Language

Towards Open-Domain Named Entity Recognition via Neural Correction Models

“FabNER”: information extraction from manufacturing process science domain literature using named entity recognition

Neural Correction Model for Open-Domain Named Entity Recognition

Building Low-Resource NER Models Using Non-Speaker Annotation

CebuaNER: A New Baseline Cebuano Named Entity Recognition Model