Research on the Characteristic of Partial Dependency for Spam Classification

LIU Zhen,TAN Liang,ZHOU Ming-tian
DOI: https://doi.org/10.3321/j.issn:0372-2112.2007.10.012
2007-01-01
Abstract:Since false positive,compared with false negative,would cause much higher negative influence on email filter's performance,it is necessary to investigate how to make the email filter become more sensitive to handle the cost of false positive.This paper brings forward an advanced fitting Logistic Regression model for spam discrimination by introducing a coefficient-weighted function which can help to implement unbalanced classifier training.Without performance degradation on classification precision,the results of the performance evaluation on actual email testing sets verify that the new categorization model is of the partial dependent characteristic evidently between the criteria of false positive ratio and false negative ratio.Meanwhile,the testing results suggest that the model is robust to perturbing data as well.
What problem does this paper attempt to address?