Convolutional Neural Networks for Sentiment Analysis on Weibo Data: A Natural Language Processing Approach

Yufei Xie,Rodolfo C. Raga Jr
DOI: https://doi.org/10.48550/arXiv.2307.06540
2023-07-13
Abstract:This study addressed the complex task of sentiment analysis on a dataset of 119,988 original tweets from Weibo using a Convolutional Neural Network (CNN), offering a new approach to Natural Language Processing (NLP). The data, sourced from Baidu's PaddlePaddle AI platform, were meticulously preprocessed, tokenized, and categorized based on sentiment labels. A CNN-based model was utilized, leveraging word embeddings for feature extraction, and trained to perform sentiment classification. The model achieved a macro-average F1-score of approximately 0.73 on the test set, showing balanced performance across positive, neutral, and negative sentiments. The findings underscore the effectiveness of CNNs for sentiment analysis tasks, with implications for practical applications in social media analysis, market research, and policy studies. The complete experimental content and code have been made publicly available on the Kaggle data platform for further research and development. Future work may involve exploring different architectures, such as Recurrent Neural Networks (RNN) or transformers, or using more complex pre-trained models like BERT, to further improve the model's ability to understand linguistic nuances and context.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of sentiment analysis on Weibo data. Specifically, the researchers utilize Convolutional Neural Networks (CNN) to process datasets from the popular Chinese Weibo platform, aiming to explore the application of CNN in natural language processing tasks and evaluate its performance in sentiment analysis. The main objective of the study is to improve the performance of sentiment analysis on Weibo data through CNN, particularly in terms of performance across different topics. Additionally, the study focuses on the characteristics of the Chinese language and its impact on sentiment analysis tasks, proposing a method that effectively captures local dependencies in the text to enhance the accuracy of sentiment prediction. Through this approach, the research hopes to provide more accurate and detailed public opinion insights for fields such as social media analysis, market research, and policy research.