Abstract:Flood of information is produced in a daily basis through the global Internet usage arising from the on-line interactive communications among users. While this situation contributes significantly to the quality of human life, unfortunately it involves enormous dangers, since on-line texts with high toxicity can cause personal attacks, on-line harassment and bullying behaviors. This has triggered both industrial and research community in the last few years while there are several tries to identify an efficient model for on-line toxic comment prediction. However, these steps are still in their infancy and new approaches and frameworks are required. On parallel, the data explosion that appears constantly, makes the construction of new machine learning computational tools for managing this information, an imperative need. Thankfully advances in hardware, cloud computing and big data management allow the development of Deep Learning approaches appearing very promising performance so far. For text classification in particular the use of Convolutional Neural Networks (CNN) have recently been proposed approaching text analytics in a modern manner emphasizing in the structure of words in a document. In this work, we employ this approach to discover toxic comments in a large pool of documents provided by a current Kaggle's competition regarding Wikipedia's talk page edits. To justify this decision we choose to compare CNNs against the traditional bag-of-words approach for text analysis combined with a selection of algorithms proven to be very effective in text classification. The reported results provide enough evidence that CNN enhance toxic comment classification reinforcing research interest towards this direction.

A Survey of Toxic Comment Classification Methods

Convolutional Neural Networks for Toxic Comment Classification

An Automated Toxicity Classification on Social Media Using LSTM and Word Embedding

Investigating Bias In Automatic Toxic Comment Detection: An Empirical Study

Character-Level Chinese Toxic Comment Classification Algorithm Based on CNN and Bi-GRU

Comparison of Deep Learning Models and Various Text Pre-Processing Techniques for the Toxic Comments Classification

Toxic Comments Hunter : Score Severity of Toxic Comments

Detection of Toxic Language in Short Text Messages

Semantic sentiment analysis based on a combination of CNN and LSTM model

Toxic Comment Classification based on Personality Traits Using NLP

Purging the Poison: A Machine Learning Approach to Filtering Toxic Comments

Modeling subjectivity (by Mimicking Annotator Annotation) in toxic comment identification across diverse communities

Machine Learning and Lexicon Approach to Texts Processing in the Detection of Degrees of Toxicity in Online Discussions

Empirical Analysis of Multi-Task Learning for Reducing Model Bias in Toxic Comment Detection

Alexnet Architecture Based Convolutional Neural Network for Toxic Comments Classification

ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation

ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments

Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks

Which one is more toxic? Findings from Jigsaw Rate Severity of Toxic Comments

Predicting Different Types of Subtle Toxicity in Unhealthy Online Conversations

Deep learning for religious and continent-based toxic content detection and classification