Development of a Language Model for Named-Entity-Recognition in Aerospace Requirements

Archana Tikayat Ray,Olivia J. Pinon Fischer,Ryan T. White,Bjorn F. Cole,Dimitri N. Mavris
DOI: https://doi.org/10.2514/1.i011251
2024-03-27
Journal of Aerospace Information Systems
Abstract:We address the challenges inherent in converting natural language (NL) requirements into machine-readable formats by investigating the application of named-entity recognition (NER) within the aerospace domain. Recognizing the necessity for domain-specific language models, we developed an open-source annotated aerospace corpus and fine-tuned different versions of the BERT language model on the corpus to create aeroBERT-NER: a new model for identifying named entities (NEs) in the aerospace domain. A comparison between aeroBERT-NER and [Formula: see text]-NER demonstrated the superior performance of aeroBERT-NER in identifying NEs within a set of aerospace requirements. The identified NEs contribute to the development of a glossary, promoting consistent terminology usage in aerospace requirements and addressing challenges associated with the standardization of NL requirements.
engineering, aerospace
What problem does this paper attempt to address?
This paper focuses on how to convert natural language (NL) requirements into machine-readable formats in the aerospace field. The researchers developed a domain-specific language model called aeroBERT-NER, which utilizes the BERT model and is fine-tuned on the aerospace corpus to identify named entities (NEs) in the aerospace domain. The purpose of this model is to improve the consistency of terms in aerospace requirements and promote standardized natural language requirements. The paper points out that errors in requirements engineering can result in high costs and system design failures. Currently, the industry is moving towards model-based approaches, but the ambiguity and inconsistency of natural language requirements hinder this process. Therefore, the researchers created an open-source aerospace named entity recognition dataset and fine-tuned the pre-trained BERT model to adapt to specific terms in the aerospace domain. With aeroBERT-NER, a vocabulary can be automatically created, enhancing communication between different stakeholders, improving requirement quality, and reducing ambiguity in natural language aerospace requirements. The main contributions of the paper include: 1. Creation of the first open-source aerospace named entity recognition dataset. 2. Demonstration of methods for collecting, cleaning, and annotating aerospace named entities from regulations, publications, etc. 3. Fine-tuning the BERT model to identify named entities in the aerospace domain (aeroBERT-NER), even with small-scale annotated datasets. 4. Showing the potential of large-scale language models in identifying named entities in the aerospace domain. The research also highlights current issues such as a lack of industrial case studies, open-source requirement datasets, and limited application of advanced natural language processing techniques in the aerospace domain. The paper concludes with the implementation and results of the proposed method and discusses possible future research directions.