Improved BTM topic embedding method for Web text data extraction
Fengcui Zhang
DOI: https://doi.org/10.1016/j.entcom.2024.100642
IF: 2.072
2024-05-01
Entertainment Computing
Abstract:With the popularity of social software such as Facebook and Twitter, online text data has begun to explode. However, due to the sparseness and imbalance of text data, it is difficult to accurately analyze and process data such as emotions contained in online text. Therefore, according to the characteristics of Web text, the Web text data extraction model based on biterm topic model, bidirectional recurrent neural network, topic embedding and attention mechanism, namely BTM-BRNN-TEAM model, is proposed to improve the accuracy of emotion analysis in short text. The test results show that when the first 10 topic words are selected, the recognition accuracy, error rate, precision, recall rate and F1-measure of this model are about 77.6%, 22.4%, 83.4%, 76.4%, and 0.82, respectively, which are higher than those of the other two models. The area under the receiver operating characteristic curve of the model is about 0.91, which indicates that its performance is superior. In the processing of short text sentences, the proposed model is able to accurately identify aspectual words that have a significant impact on the text, while weakening the impact of non-aspectual words. The above results show that the Web text processing model proposed in this study has superior performance in sentiment analysis and semantic recognition.
computer science, cybernetics, interdisciplinary applications, software engineering