Semantic Encoding Algorithm for Classification and Retrieval of Aviation Safety Reports

Yubing Gao,GuangYu Zhu,Ya Duan,Jianfeng Mao
DOI: https://doi.org/10.1109/tase.2024.3359356
IF: 6.636
2024-01-01
IEEE Transactions on Automation Science and Engineering
Abstract:Automated analysis of aviation safety reports is helpful in effectively preventing future accidents and improving emergency response capabilities. To date, there are no publicly available large-scale aviation text similarity datasets, which hinders the successful application of NLP techniques in the aviation domain. We present an automatically created aviation text similarity dataset consisting of more than 500,000 pairs for fine-tuning pretrained language models. Since technical terms have specialized meanings that differ from everyday language, we propose an efficient semantic encoding algorithm to improve the ability of embeddings to adequately represent aviation terms. We provide new solutions and revised evaluation metrics for the classification and the retrieval of safety reports, confirming the reliability of our dataset and the superiority of our algorithm. Note to Practitioners—Text representation is an essential task in natural language processing(NLP). A crucial step towards the successful application of NLP in safety reports analysis is to ensure that aviation texts are adequately encoded. Aiming at the problem of poor ability of current embeddings to represent technical terms, we automatically create an aviation text similarity dataset and propose a semantic encoding algorithm for aviation terms. It is clear that the proposed method has great potential in representation of technical terms, thus providing assistance for downstream tasks such as text classification, information retrieval and question answering.
automation & control systems
What problem does this paper attempt to address?