Highly Discriminative Features for Phishing Email Classification by SVD

masoumeh zareapoor,pourya shamsolmoali,m afshar alam
DOI: https://doi.org/10.1007/978-81-322-2250-7_65
2015-01-01
Abstract:Unstructured text documents have drawn recently more attention, because with growing amount of text documents, there is a need to classify them automatically. But an important problem in field of text categorization is the huge dimensional and very sparse dataset which hurts generalization performance of classifiers. This paper presents a Singular Value Decomposition (SVD) technique to email classification, in order to compress optimally only the kind of documents (in our experiments email classes) and to retain the most informative and discriminate features from an email document. The performance evaluation is performed on email dataset which is publicly available to demonstrate the benefit of the LSA.
What problem does this paper attempt to address?