Easy Data Augmentation in Sentiment Analysis of Cyberbullying

Alwan Wirawan,Hasan Dwi Cahyono,Winarno
2023-11-29
Abstract:Instagram, a social media platform, has in the vicinity of 2 billion active users in 2023. The platform allows users to post photos and videos with one another. However, cyberbullying remains a significant problem for about 50% of young Indonesians. To address this issue, sentiment analysis for comment filtering uses a Support Vector Machine (SVM) and Easy Data Augmentation (EDA). EDA will augment the dataset, enabling robust prediction and analysis of cyberbullying by introducing more variation. Based on the tests, SVM combination with EDA results in a 2.52% increase in the k-Fold Cross Validation score. Our proposed approach shows an improved accuracy of 92.5%, 2.5% higher than that of the existing state-of-the-art method. To maintain the reproducibility and replicability of this research, the source code can be accessed at <a class="link-external link-http" href="http://uns.id/eda_svm" rel="external noopener nofollow">this http URL</a>.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the issue of cyberbullying on social media platforms. Specifically, the researchers utilize Support Vector Machine (SVM) combined with Easy Data Augmentation (EDA) to perform sentiment analysis on comments on Instagram, in order to filter out comments that may involve cyberbullying. #### Main Issues: 1. **Cyberbullying Issue**: On social platforms like Instagram, cyberbullying remains a serious problem, especially among young people in Indonesia, where approximately 50% of young people have experienced cyberbullying. 2. **Challenges of Small Datasets**: Existing studies often use smaller datasets (fewer than 1280 samples per category), making it difficult for models to learn the fundamental patterns of the data, thereby affecting the model's generalization ability. #### Solutions: - **Sentiment Analysis**: Identifying and filtering out comments involving cyberbullying through sentiment analysis. - **Support Vector Machine (SVM)**: Using SVM as the classification algorithm. - **Easy Data Augmentation (EDA)**: Introducing simple data augmentation techniques to improve the model's predictive performance by increasing the variability of the dataset. #### Experimental Results: - The method combining SVM with EDA improved the k-fold cross-validation score by 2.52%, reaching 89.74%. - The model's accuracy reached 92.5%, which is 2.5% higher than the existing best method. With these improvements, the researchers hope to more effectively identify and filter out comments related to cyberbullying on social media platforms, thereby reducing the psychological harm caused by cyberbullying.