Applying machine learning and natural language processing to detect phishing email

Areej Alhogail,Afrah Alsabih
DOI: https://doi.org/10.1016/j.cose.2021.102414
2021-11-01
Abstract:The growth of online services has been accompanied by increased growth in cyber-attacks. One of the most common effective attacks is phishing, in which attempts are made to steal confidential information by impersonating a legitimate source. The success of phishing emails is based on manipulating human emotions, which leads to concerns and creates an urgent situation by claiming that the recipient should take quick action that may cause great financial and data losses. Therefore, we cannot rely solely on humans to detect phishing, and more effective and automatic phishing detection mechanisms are required. Many detectors have been proposed; however, the high number of phishing emails urges additional effort. Hence, in this study, we propose a phishing email classifier model that applies deep learning algorithms using a graph convolutional network (GCN) and natural language processing over an email body text to improve phishing detection accuracy. The literature has proved GCN success in text classification, and this study proved its success in improving the accuracy of email phishing detection. The classifier was tested in a supervised learning approach. Experimental tests verified that the classifier was effective in detecting phishing emails using body text among the existing detection methods, and it took short time and produced a high accuracy rate of 98.2% and a low false-positive rate of 0.015.
computer science, information systems
What problem does this paper attempt to address?