Railway Fault Text Clustering Method Using an Improved Dirichlet Multinomial Mixture Model

Ni Yang,Youpeng Zhang
DOI: https://doi.org/10.1155/2022/7882396
IF: 1.43
2022-07-06
Mathematical Problems in Engineering
Abstract:Railway signal equipment fault data (RSEFD) are one of the issues with in-depth traffic big data analysis throughout the life cycle of intelligent transportation. In the course of daily operation and maintenance, the railway electrical maintenance department records equipment malfunction information in a natural language. The data have the characteristics of strong professionalism, short text, unbalanced category, and low efficiency of manual analysis and processing. How to effectively mine the information contained in these fault texts to provide help for on-site operation and maintenance plays an important role. Therefore, we propose a railway fault text clustering method using an improved Dirichlet multinomial mixture model called ICH-GSDMM. In this method, first, the railway signal terminology thesaurus is established to overcome the inaccurate problem of RSEFD segmentation. Second, the traditional Chi square statistics is improved to overcome the learning difficulties caused by the imbalance of RSEFD. Finally, the Gibbs sampling algorithm for Dirichlet multinomial mixture model (GSDMM) is modified using an improved chi-square statistical method (ICH) to overcome the symmetry problem of the word Dirichlet prior parameters in the traditional GSDMM. Compared to the traditional GSDMM model and the GSDMM model based on chi-square statistics (CH-GSDMM), the quantitative experimental results show that the GSDMM model based on improved chi-square statistics (ICH-GSDMM internal)'s evaluation index of clustering performance has greatly improved, and its external evaluation indices are also the best, with the exception of external index NMI of data set DS2. Simultaneously, the diagnostic accuracy of a select few categories in RSEFD has considerably improved, demonstrating its efficacy.
engineering, multidisciplinary,mathematics, interdisciplinary applications
What problem does this paper attempt to address?