Improving Phishing Detection Via Psychological Trait Scoring

Sadat Shahriar,Arjun Mukherjee,Omprakash Gnawali
DOI: https://doi.org/10.48550/arXiv.2208.06792
2022-08-14
Social and Information Networks
Abstract:Phishing emails exhibit some unique psychological traits which are not present in legitimate emails. From empirical analysis and previous research, we find three psychological traits most dominant in Phishing emails - A Sense of Urgency, Inducing Fear by Threatening, and Enticement with Desire. We manually label 10% of all phishing emails in our training dataset for these three traits. We leverage that knowledge by training BERT, Sentence-BERT (SBERT), and Character-level-CNN models and capturing the nuances via the last layers that form the Phishing Psychological Trait (PPT) scores. For the phishing email detection task, we use the pretrained BERT and SBERT model, and concatenate the PPT scores to feed into a fully-connected neural network model. Our results show that the addition of PPT scores improves the model performance significantly, thus indicating the effectiveness of PPT scores in capturing the psychological nuances. Furthermore, to mitigate the effect of the imbalanced training dataset, we use the GPT-2 model to generate phishing emails (Radford et al., 2019). Our best model outperforms the current State-of-the-Art (SOTA) model's F1 score by 4.54%. Additionally, our analysis of individual PPTs suggests that Fear provides the strongest cue in detecting phishing emails.
What problem does this paper attempt to address?