The Improved Logistic Regression Models for Spam Filtering

Yong Han,Muyun Yang,Haoliang Qi,Xiaoning He,Sheng Li
DOI: https://doi.org/10.1109/IALP.2009.74
2009-01-01
Abstract:The logistic regression model has achieved success in spam filtering. But it is disadvantaged by the equal adjustment of the feature weights appeared in both spam messages and ham ones during training period. This paper presents an improved logistic regression model which reduces the impact of the features appearing in both spam messages and ham ones. Byte level n-grams are employed to extract the features from messages, and TONE (train on or near error) is adopted, which are proved effective in state-of-the-art spam filtering system. The official runs of CEAS (Conference on email and anti-spam) spam-filter Challenge 2008 show that the proposed model is one of the best methods. Our system achieved competitive results in all tasks and is the winner of active learning on the live stream by 1-ROCA.
What problem does this paper attempt to address?