Resource Construction and Ensemble Learning based Sentiment Analysis for the Low-resource Language Uyghur

Azragul Yusup Azragul Yusup,Degang Chen Azragul Yusup,Yifei Ge Degang Chen,Hongliang Mao Yifei Ge,Nujian Wang Hongliang Mao
DOI: https://doi.org/10.53106/160792642023072404018
2023-07-01
網際網路技術學刊
Abstract:To address the problem of scarce low-resource sentiment analysis corpus nowadays, this paper proposes a sentence-level sentiment analysis resource conversion method HTL based on the syntactic-semantic knowledge of the low-resource language Uyghur to convert high-resource corpus to low-resource corpus. In the conversion process, a k-fold cross-filtering method is proposed to reduce the distortion of data samples, which is used to select high-quality samples for conversion; finally, the Uyghur sentiment analysis dataset USD is constructed; the Baseline of this dataset is verified under the LSTM model, and the accuracy and F1 values reach 81.07% and 81.13%, respectively, which can provide a reference for the construction of low-resource language corpus nowadays. The accuracy and F1 values reached 81.07% and 81.13%, respectively, which can provide a reference for the construction of today’s low-resource corpus. Meanwhile, this paper also proposes a sentiment analysis model based on logistic regression ensemble learning, SA-LREL, which combines the advantages of several lightweight network models such as TextCNN, RNN, and RCNN as the base model, and the meta-model is constructed using logistic regression functions for ensemble, and the accuracy and F1 values reach 82.17% and 81.86% respectively in the test set, and the experimental results show that the method can effectively improve the performance of Uyghur sentiment analysis task.  
computer science, information systems,telecommunications
What problem does this paper attempt to address?