Abstract:Short message services (SMS), microblogging tools, instant message apps, and commercial websites produce numerous short text messages every day. These short text messages are usually guaranteed to reach mass audience with low cost. Spammers take advantage of short texts by sending bulk malicious or unwanted messages. Short texts are difficult to classify because of their shortness, sparsity, rapidness, and informal writing. The effectiveness of the hidden Markov model (HMM) for short text classification has been illustrated in our previous study. However, the HMM has limited capability to handle new words, which are mostly generated by informal writing. In this paper, a hybrid model is proposed to address the informal writing issue by weighting new words for fast short text filtering with high accuracy. The hybrid model consists of an artificial neural network (ANN) and an HMM, which are used for new word weighting and spam filtering, respectively. The weight of a new word is calculated based on the weights of its neighbor, along with the spam and ham (i.e., not spam) probabilities of short text message predicted by the ANN. Performance evaluations on benchmark datasets, including the SMS message data maintained by University of California, Irvine; the movie reviews, and the customer reviews are conducted. The hybrid model operates at a significantly higher speed than deep learning models. The experiment results show that the proposed hybrid model outperforms other prominent machine learning algorithms, achieving a good balance between filtering throughput and accuracy.

Dynamic Rules' Score Adjustment In Spam Filter Using Users' Feedback

Incremental learning based on interactive spam filter

Real-Time Statistical Rules for Spam Detection

Effective spam filter based on a hybrid method of header checking and content parsing

Analyzing and Detecting Adversarial Spam on a Large-scale Online APP Review System.

A Composite Intelligent Method For Spam Filtering

Implementation and Evaluation of Chinese Spam Filtering System

Intelligent Detection Approaches for Spam

Filtering Spam In Social Tagging System With Dynamic Behavior Analysis

A Game Model for Adversarial Classification in Spam Filtering

A Spam Filtering Scheme Based on Scalable Decision Tree

Application of Natural Language Processing and Machine Learning Boosted with Swarm Intelligence for Spam Email Filtering

Three-Way Decisions Solution to Filter Spam Email: An Empirical Study

An Adaptive Fusion Algorithm for Spam Detection

PRIS Kidult Anti-SPAM Solution at the TREC 2005 Spam Track: Improving the Performance of Naive Bayes for Spam Detection.

Filtering Chinese Spam Email Using Logistic Regression

Spammer detection via ranking aggregation of group behavior

Spam Message Self-Adaptive Filtering System Based on Naive Bayes and Support Vector Machine

Classify E-mails by Support Vector Machine

A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts

Spam Classification: Genetically Optimized Passive-Aggressive Approach