Umme Faria Moon,MD Ahsan Habib Rasel,Md. Musfique Anwar
Abstract:The use of social media platforms has been gradually increasing and fake news spreading is becoming an alarming issue nowadays. The spreading of fake news means disseminating false, confusing, and spurious information which hurts families, communities etc. As a result, this issue has to be resolved sooner so that we can limit the spread of fake news in the virtual world. One needs to identify the fake news spreader to address this issue. In this research, we have tried to reveal the users who are most likely to share fake news as well as the spread prediction that shared pieces of fake news in the social network. We take into account the users information, such as follower counts, like counts, and retweet counts along with users topical interests on different topics as well as connection strength by considering the follower-following ratio. We also consider the complexity features, stylistic features, and psychological effects of news. Finally, we applied different machine-learning algorithms to evaluate the performance of the proposed model. Our observation is that the probability of spreading a piece of news shared by users having more followers as well as more likes and retweet counts (aka influential users) is higher compared with other users.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the spread of fake news on social media platforms. Specifically, the author focuses on how to identify users who are most likely to share fake news and predict the spread of these fake news in social networks. Through this research, the author hopes to limit the spread of fake news in the virtual world and reduce its negative impacts on families, communities, etc.
### Research Background and Problem Description
With the popularization of social media platforms (such as Facebook, Twitter, etc.), the spread of fake news has become an increasingly serious problem. Fake news refers to false and misleading information, which may cause serious harm to society. Therefore, how to effectively identify and control the spread of fake news has become an urgent problem to be solved.
### Research Objectives
1. **Identify Fake News Spreaders**: Determine which users are most likely to share fake news.
2. **Predict the Spread of Fake News**: Evaluate the spread range and impact of fake news in the future period.
### Research Methods
To achieve the above objectives, the author took the following steps:
1. **Data Collection and Pre - processing**:
- Use the FibVID dataset, which contains true and false news related to COVID - 19 and non - COVID - 19 and their spread situations.
- Extract users' feature information, including the number of followers, the number of likes, the number of retweets, and users' interest degrees in different topics.
2. **Feature Extraction**:
- **Complexity Features**: Use indicators such as the smog index, lexical diversity, and average word length to measure the complexity of news.
- **Psychological Features**: Analyze the emotional polarity (positive, negative, neutral) and emotional subjectivity of news.
- **Style Features**: Consider the writing style and the use of personal and non - personal pronouns.
3. **Model Construction**:
- Use T - LDA (Twitter LDA) for topic modeling to determine the interest distribution of users.
- Apply multiple machine - learning algorithms (such as Random Forest, XGBoost, Logistic Regression, etc.) to classify news as true or false and predict the spread of fake news.
4. **Model Evaluation**:
- Evaluate the performance of the classification model through indicators such as accuracy, precision, recall, and F1 - score.
- Use RMSE (Root Mean Square Error) and R² value to evaluate the performance of the regression model.
### Main Findings
- **Classification Results**: The Random Forest classifier performs best, with an accuracy rate of 92.05% and an F1 - score of 92.05%.
- **Regression Results**: XGBRegressor outperforms the Random Forest regressor in predicting the number of retweets, with an R² value of 0.86 and an RMSE of 3309.18.
### Conclusions
Through this research, the author has successfully identified potential fake news spreaders and can predict the spread of fake news relatively accurately. Future research can be further extended to analyze multi - layer network structures to more comprehensively understand the spread paths and influence ranges of fake news.
### Formula Display
- **Twitter Follower Following Ratio (TFF)**:
\[
TFF=\frac{\text{#Follower}+1}{\text{#Followee}+1}
\]
This formula is used to calculate the influence and participation of users and avoids the situation where the denominator is zero.
Through the above methods, this research provides strong support for understanding and dealing with the spread of fake news on social media.