Research on Chinese Word Segmentation Algorithm based on Dictionary and Statistical Method and Its Application in the field of Power Grid Control
Kun Huang,Zhen Yan,Chao Shen,Kun Zhao,Mengge Mao,Jinxia Dai
DOI: https://doi.org/10.1109/GCRAIT55928.2022.00040
2022-07-01
Abstract:In view of the particularity of Chinese word segmentation in the professional field of power grid regulation, the performance of Chinese word segmentation based on traditional statistical algorithm is frequently limited by the corpus training in the proprietary field, and the insufficient ability of Chinese word segmentation based on dictionary algorithm in identifying new words and disambiguation in the proprietary field. This paper proposes a Chinese word segmentation method based on the combination of dictionary and statistical method, and applies it to the application scenario of querying the operation status of power grid. This method constructs a professional dictionary in the field of power grid regulation by using the Double-Array Tree storage structure. Then using the CRF(Conditional Random Fields) train the text data to generate the model file, and the FMM(Forward Maximum Matching) method is used for Chinese word segmentation combined with the professional dictionary in the field of power grid regulation. A fast and accurate online word segmentation method in the field of power grid regulation is designed and implemented, it performs well in some main performance indexes. In the application scenario of querying the power grid operation status, the Chinese word segmentation results obtained by this method combined with the standard query sentence pattern realize the rapid classification of application scenario query and the accurate execution of operation commands, greatly improve the efficiency and accuracy of querying the power grid operation status, and improve the human-computer interaction experience.
Engineering,Computer Science