Abstract:Email spam causes a serious waste of time and resources. This paper addresses the email spam filtering problem and proposes an online active multi-field learning approach, which is based on the following ideas: (1) Email spam filtering is an online application, which suggests an online learning idea; (2) Email document has a multi-field text structure, which suggests a multi-field learning idea; and (3) It is costly to obtain a label for a real-world email spam filter, which suggests an active learning idea. The online learner regards the email spam filtering as an incremental supervised binary streaming text classification. The multi-field learner combines multiple results predicted by field classifiers in a novel compound weight schema, and each field classifier calculates the arithmetical average of multiple conditional probabilities calculated from feature strings according to a data structure of string-frequency index. Comparing the current variance of field classifying results with the historical variance, the active learner evaluates the classifying confidence and takes the more uncertain email as the more informative sample for which to request a label. The experimental results show that the proposed approach can achieve the state-of-the-art performance with greatly reduced label requirements and very low space-time costs. The performance of our online active multi-field learning, the standard (1-ROCA)% measurement, even exceeds the full feedback performance of some advanced individual text classification algorithms.

Active learning based spam filtering method

Online Active Multi-Field Learning for Efficient Email Spam Filtering

Quick Online Spam Classification Method Based on Active and Incremental Learning

Active Learning for Spam Email Classification

Ensemble Learning and Active Learning Based Personal Spam Email Filtering

Training SpamAssassin with Active Semi-supervised Learning

Spam Filtering by Stages

Effective spam filter based on a hybrid method of header checking and content parsing

An Imbalanced Spam Mail Filtering Method

A Spam Filtering Method Based on Bayesian Neural Network

Active Learning for Online Spam Filtering

Research On Advanced Filtering Algorithm For Anti-Spam Based On A Bayesian Classification Model

Research of a Novel Anti-Spam Technique Based on Users' Feedback and Improved Naive Bayesian Approach

SVM-Based Spam Filter with Active and Online Learning.

Incremental learning based on interactive spam filter

Combining behavior and Bayesian Chinese spam filter

A Spam Filter Approach with the Improved Machine Learning Technology

NASC: A Novel Approach for Spam Classification

Spam Filtering System Based on Uncertain Learning

Improving The Performance Of Naive Bayes Classifier For Spam Detection

A Two-Stage Spam Email Filtering Method Based on Naive Bayes and Hierarchical Clustering