A Natural Language Processing System for Text Classification Corpus Based on Machine Learning

Yawen Su
DOI: https://doi.org/10.1145/3648361
IF: 1.471
2024-02-19
ACM Transactions on Asian and Low-Resource Language Information Processing
Abstract:A classification system for hazardous materials in air traffic control was investigated using the Human Factors Analysis and Classification System (HFACS) framework and natural language processing to prevent hazardous situations in air traffic control. Based on the development of the HFACS standard, an air traffic control hazard classification system will be created. The dangerous data of the aviation safety management system is selected by dead bodies, classified and marked in 5 levels. TFIDF TextRank text classification method based on key content extraction and text classification model based on CNN and BERT model were used in the experiment to solve the problem of small samples, many labels and random samples in hazardous environment of air pollution control. The results show that the total cost of model training time and classification accuracy is the highest when the keywords are around 8. As the number of points increases, the time spent in dimensioning decreases and affects accuracy. When the number of points reaches about 93, the time spent in determining the size increases, but the accuracy of the allocation remains close to 0.7, but the increase in the value of time leads to a decrease in the total cost. It has been proven that extracting key content can solve text classification problems for small companies and contribute to further research in the development of security systems.
computer science, artificial intelligence
What problem does this paper attempt to address?