Application of the maintenance text data of transformers based on SimHash and Hamming distance algorithm

Yao Yuan,Ruihai Li,Yingjie Wang,Tienan Cao,Jiahui Yang,Yuan La
DOI: https://doi.org/10.1109/ichve49031.2020.9279852
2020-09-06
Abstract:Power companies have accumulated a large amount of maintenance data of power equipment in text format. To extract valuable information, text mining is expected. Text similarity is an important method for text mining; however, the feature dimensions increase for long texts. In this paper, the SimHash algorithm is used to map the original text into a 64-bit binary fingerprint, and the similarity between texts is then determined with Hamming distance. With this method, the recommendation model of maintenance measures is established. The verification results show that the recommendation model has good predictions based on the SimHash and Hamming distance algorithm, and is potential to apply in the practical field.
What problem does this paper attempt to address?