Character-Level Chinese Toxic Comment Classification Algorithm Based on CNN and Bi-GRU

Zhongguo Wang,Bao Zhang
DOI: https://doi.org/10.1145/3569966.3570000
2022-10-21
Abstract:At present, the classification of “toxic comment” is mainly studied in the English context, whereas Chinese context is less explored and even lacks a public corpus. As many comment are short texts with sparse features and strong context dependence, this study proposes a character-level embedded neural network model based on the convolutional neural network (CNN) and bidirectional gated recurrent unit (Bi-GRU). Then, the classification of toxic comment based on the Chinese toxic comment dataset is developed. In our proposed model, the CNN, which combines character- and word-level vectors, is used to fully obtain the local important features of the text, and then the bidirectional timing information acquisition ability of Bi-GRU is used to improve the accuracy of the Chinese toxic comment classification. Experimental results show that the F1 score of our proposed model can reach 0.8081, which is better than the correlation comparison models.
Computer Science
What problem does this paper attempt to address?