Social Media Sentiment Analysis Using Twitter Dataset

Aditi Jadeja,Naik Ramesh Ram,Nitin Rathore,Sunil Gautum,Hardi Joisar
DOI: https://doi.org/10.1109/IC-CGU58078.2024.10530694
2024-03-01
Abstract:This study delves into Twitter hate speech detection, emphasizing sentiment analysis and machine learning model performance. Data preprocessing ensures data integrity through label validation and pattern removal. Through exploratory data analysis, word clouds reveal the top 30 frequently used words and 20 common hashtags, providing insight into prevalent sentiments and themes. Feature engineering involves tokenization using the Genism Word2Vec model, sentiment labeling, and stop word removal for improved text quality and consistency. Four machine learning models (Random Forest, Logistic Regression, Decision Tree, and Support Vector Classifier) are employed for hate speech prediction, with the training dataset divided into training and validation sets. The results are striking, with Random Forest and Support Vector Classifier models achieving a remarkable 95 percent accuracy, closely followed by Logistic Regression and Decision Tree models with accuracies of 94 percent and 93 percent, respectively. This research underscores the potential of sentiment analysis in hate speech detection and offers valuable insights for combating hate speech on social media platforms.
Computer Science
What problem does this paper attempt to address?