Cyberbullying detection in social media text based on character‐level convolutional neural network with shortcuts

Nijia Lu,Guohua Wu,Zhen Zhang,Yitao Zheng,Yizhi Ren,Kim‐Kwang Raymond Choo
DOI: https://doi.org/10.1002/cpe.5627
2020-01-03
Concurrency and Computation: Practice and Experience
Abstract:<p>As people spend increasingly more time on social networks, cyberbullying has become a social problem that needs to be solved by machine learning methods. Our research focuses on textual cyberbullying detection because text is the most common form of social media. However, the content information in social media is short, noisy, and unstructured with incorrect spellings and symbols, and this impacts the performance of some traditional machine learning methods based on vocabulary knowledge. For this reason, we propose a Char‐CNNS (<b>Char</b>acter‐level <b>C</b>onvolutional <b>N</b>eural <b>N</b>etwork with <b>S</b>hortcuts) model to identify whether the text in social media contains cyberbullying. We use characters as the smallest unit of learning, enabling the model to overcome spelling errors and intentional obfuscation in real‐world corpora. Shortcuts are utilized to stitch different levels of features to learn more granular bullying signals, and a focal loss function is adopted to overcome the class imbalance problem. We also provide a new Chinese <i>Weibo</i> comment dataset specifically for cyberbullying detection, and experiments are performed on both the Chinese Weibo dataset and the English Tweet dataset. The experimental results show that our approach is competitive with state‐of‐the‐art techniques on cyberbullying detection task.</p>
What problem does this paper attempt to address?