Multi-Field Learning For Email Spam Filtering

Wuying Liu,Ting Wang
DOI: https://doi.org/10.1145/1835449.1835595
2010-01-01
Abstract:Through the investigation of email document structure, this paper proposes a multi-field learning (MFL) framework, which breaks the multi-field document Text Classification (TC) problem into several sub-document TC problems, and makes the final category prediction by weighted linear combination of several sub-document TC results. Many previous statistical TC algorithms can be easily rebuilt within the MFL framework via turning binary result to spamminess score, which is a real number and reflects the likelihood that the classified email is spam. The experimental results in the TREC spam track show that the performances of many TC algorithms can be improved within the MFL framework.
What problem does this paper attempt to address?