Abstract:Email spam causes a serious waste of time and resources. This paper addresses the email spam filtering problem and proposes an online active multi-field learning approach, which is based on the following ideas: (1) Email spam filtering is an online application, which suggests an online learning idea; (2) Email document has a multi-field text structure, which suggests a multi-field learning idea; and (3) It is costly to obtain a label for a real-world email spam filter, which suggests an active learning idea. The online learner regards the email spam filtering as an incremental supervised binary streaming text classification. The multi-field learner combines multiple results predicted by field classifiers in a novel compound weight schema, and each field classifier calculates the arithmetical average of multiple conditional probabilities calculated from feature strings according to a data structure of string-frequency index. Comparing the current variance of field classifying results with the historical variance, the active learner evaluates the classifying confidence and takes the more uncertain email as the more informative sample for which to request a label. The experimental results show that the proposed approach can achieve the state-of-the-art performance with greatly reduced label requirements and very low space-time costs. The performance of our online active multi-field learning, the standard (1-ROCA)% measurement, even exceeds the full feedback performance of some advanced individual text classification algorithms.

SVM-Based Spam Filter with Active and Online Learning.

Active Learning for Online Spam Filtering

An architecture of active learning SVMs for spam

An Imbalanced Spam Mail Filtering Method

Active Learning with Simplified SVMs for Spam Categorization

Online Active Multi-Field Learning for Efficient Email Spam Filtering

Camouflaged Chinese Spam Content Detection with Semi-supervised Generative Active Learning.

A Support Vector Machine Based Naive Bayes Algorithm for Spam Filtering

A Chinese Anti-Spam Filter Approach Based on Support Vector Machine

Training SpamAssassin with Active Semi-supervised Learning

A Spam Filter Approach with the Improved Machine Learning Technology

Online Linear Discriminative Learning for Spam Filter

Research on Spam Filtering Technology Using Support Vector Machine

Spam Messages Filtering System Based on SVM

Application of LS-SVM in Spam-filtering

Ensemble Learning and Active Learning Based Personal Spam Email Filtering

Online Spam Filtering Based on Ensemble Learning of Multi-filter

Classify E-mails by Support Vector Machine

Active learning based spam filtering method

Training SVM Email Classifiers Using Very Large Imbalanced Dataset

Spam Filtering System Study Based on 2V-Svm