Text Mining for Phishing E-mail Detection

Masoumeh Zareapoor,K. R. Seeja
DOI: https://doi.org/10.1007/978-81-322-2012-1_8
2014-01-01
Abstract:Phishing e-mails are threats to online banking transactions as it mislead the customer to disclose their valuable information which results in monetary losses. Common approach is to extract some specific features from phishing e-mails in a semiautomatic way by using small scripts which is a very tedious process. This paper proposes text mining for extracting distinguishing features from a collection of e-mails consists of both phishing and legitimate for better detection of phishing attack. Proposed method first convert the e-mails to a vector representation and then feature selection techniques are used for selecting best features for classification. The proposed method is evaluated by using a data set collected from the HamCorpus of SpamAssasssin project (legitimate e-mail) and the publicly available PhishingCorpus (phishing e-mail) and found that text mining-based phishing detection is simple, fast, and more accurate than the state-of-the-art methods.
What problem does this paper attempt to address?