Abstract:Email spam causes a serious waste of time and resources. This paper addresses the email spam filtering problem and proposes an online active multi-field learning approach, which is based on the following ideas: (1) Email spam filtering is an online application, which suggests an online learning idea; (2) Email document has a multi-field text structure, which suggests a multi-field learning idea; and (3) It is costly to obtain a label for a real-world email spam filter, which suggests an active learning idea. The online learner regards the email spam filtering as an incremental supervised binary streaming text classification. The multi-field learner combines multiple results predicted by field classifiers in a novel compound weight schema, and each field classifier calculates the arithmetical average of multiple conditional probabilities calculated from feature strings according to a data structure of string-frequency index. Comparing the current variance of field classifying results with the historical variance, the active learner evaluates the classifying confidence and takes the more uncertain email as the more informative sample for which to request a label. The experimental results show that the proposed approach can achieve the state-of-the-art performance with greatly reduced label requirements and very low space-time costs. The performance of our online active multi-field learning, the standard (1-ROCA)% measurement, even exceeds the full feedback performance of some advanced individual text classification algorithms.

Adaptive Email Spam Filtering Based on Information Theory

Adaptive transfer learning for spam filtering

Online Active Multi-Field Learning for Efficient Email Spam Filtering

An Adaptive Fusion Algorithm for Spam Detection

Research On Advanced Filtering Algorithm For Anti-Spam Based On A Bayesian Classification Model

Effective spam filter based on a hybrid method of header checking and content parsing

An Imbalanced Spam Mail Filtering Method

Utilizing Multi-Field Text Features for Efficient Email Spam Filtering.

Intelligent spam mail filtering system based on comprehensive information theory

Design And Implement Cost-Sensitive Email Filtering Algorithms

A Composite Intelligent Method For Spam Filtering

An Advanced Spam Detection Technique Based On Self-Adaptive Piecewise Hash Algorithm

A Spam Acquirement Technology Based on Immune-Inspired Clustering Algorithm

An Evidential Spam-Filtering Framework.

Artificial Immune System Inspired Behavior-Based Anti-Spam Filter

Simplified Chinese spam mail filter:design and performance evaluation

Research on Advanced Filtering Algorithm for Spam Email Based on Bayes Parameter Estimation

Research of a Novel Anti-Spam Technique Based on Users' Feedback and Improved Naive Bayesian Approach

Spam Filtering Based on Knowledge Transfer Learning

Advanced Filtering Technology for Spam E-mail Based on Multilevel Attributes Set

An Evaluation of Statistical Spam Filtering Techniques