Unmasking phishers: ML for malicious certificate detection

Magnea Haraldsdóttir,Sajad Homayoun,Emil Lynge,Christan D. Jensen
DOI: https://doi.org/10.1016/j.cie.2024.110652
IF: 7.18
2024-10-20
Computers & Industrial Engineering
Abstract:Phishing attacks increasingly use digital certificates to appear safe to users, and the frequency of such attacks has surged in recent years. As an example, around 80% of the 2021 phishing attacks used digital certificates to appear legitimate. The most common methods today for detecting phishing websites rely on users reporting the websites to phishing repositories, where they are then confirmed. This process can be slow, allowing the attacker to have time to have their phishing attack out on the Internet. Newer methods that implement machine learning models for the detection of phishing websites based on their digital certificate have been shown to be effective. This paper presents a system that uses certificate and domain name related features along with machine learning methods for the detection of phishing websites. To develop the system, data was collected from PhishTank and Tranco for domain names, and Censys was used for certificate retrieval. The domain related features are partly engineered using a time-series based deep learning model to get a vector representation of the domain name. Using the features engineered from the certificate and domain name, classical machine learning classifiers are trained and compared. Enriching the feature set with the vector representation of the domain names results in higher performance in distinguishing suspicious certificates from benign ones, going from an F1-score of 0.77 for a feature set solely based on certificate-related features to a performance of 0.89 with the enriched feature set. A time-based evaluation reflects the same performance with an F1-score of 0.88, which is an improvement compared to existing approaches to feature engineering.
computer science, interdisciplinary applications,engineering, industrial
What problem does this paper attempt to address?