Deep Learning-Based Speech and Vision Synthesis to Improve Phishing Attack Detection through a Multi-layer Adaptive Framework

Tosin Ige,Christopher Kiekintveld,Aritran Piplai
2024-02-27
Abstract:The ever-evolving ways attacker continues to im prove their phishing techniques to bypass existing state-of-the-art phishing detection methods pose a mountain of challenges to researchers in both industry and academia research due to the inability of current approaches to detect complex phishing attack. Thus, current anti-phishing methods remain vulnerable to complex phishing because of the increasingly sophistication tactics adopted by attacker coupled with the rate at which new tactics are being developed to evade detection. In this research, we proposed an adaptable framework that combines Deep learning and Randon Forest to read images, synthesize speech from deep-fake videos, and natural language processing at various predictions layered to significantly increase the performance of machine learning models for phishing attack detection.
Cryptography and Security,Artificial Intelligence,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the deficiencies of existing anti - phishing attack detection methods in the face of complex and increasingly sophisticated phishing attacks. Specifically, current anti - phishing methods mainly rely on traditional methods such as blacklists/whitelists, natural language processing, visual similarity, and rules, and these methods are difficult to deal with complex phishing websites that use deep - fake videos, images or text content. Therefore, existing machine - learning models have limitations in detecting complex phishing attacks. ### Main problems include: 1. **Complex phishing techniques**: Attackers are constantly improving their phishing techniques to bypass the existing state - of - the - art phishing detection methods. 2. **Dataset quality**: The datasets used to train the models fail to reflect the attackers' constantly changing strategies, resulting in poor model performance. 3. **Balance between human factors and model accuracy**: Legitimate newly - registered websites may be mislabeled as illegal due to weak domain authority. 4. **Short lifespan of phishing websites**: Phishing websites are usually created and deleted in a short time, making detection difficult. 5. **Insufficient detection ability for uploaded multimedia content**: Existing machine - learning models cannot effectively detect phishing websites that use deep - fake videos, images or text content. ### Solutions proposed in the paper: To solve the above problems, this paper proposes a multi - layer adaptive framework that combines deep learning and random forest algorithms. By using computer vision to read images, synthesizing speech from deep - fake videos, and natural language processing, it significantly improves the performance of machine - learning models in phishing attack detection. Specifically: - **First layer (URL - Based Training)**: Use traditional machine - learning methods to train URLs, select the best features and classify them. - **Second layer (Image Processing)**: Crawl HTML content through web pages and use OCR technology to convert images into text. - **Third layer (Speech Synthesis)**: Extract audio from videos and convert it into text through speech recognition. - **Fourth layer (Final Prediction with LSTM)**: Input the text processed in the previous three layers into the LSTM network for final prediction. Through this multi - layer adaptive framework, the paper aims to overcome the limitations of existing methods and improve the detection ability of complex phishing attacks. ### Formula representation: To ensure the correctness and readability of the formulas, the following are some formula examples involved in the paper: 1. **Decision tree depth control in the random forest algorithm**: \[ T_i = \begin{cases} 1 & \text{if } T \leq 1 \\ 1 + \beta T & \text{if } T > 1 \end{cases} \] where \( T = T_{\text{now}} - T_{\text{last}} \) or \( T = T_{\text{now}} - T_{\text{update}} \), depending on whether \( T_{\text{last}} \) is NULL. 2. **Gating mechanism in the LSTM network**: \[ f_t = \sigma(W_f \cdot [h_{t - 1}, x_t] + b_f) \] \[ i_t = \sigma(W_i \cdot [h_{t - 1}, x_t] + b_i) \] \[ o_t = \sigma(W_o \cdot [h_{t - 1}, x_t] + b_o) \] \[ \tilde{C}_t = \tanh(W_C \cdot [h_{t - 1}, x_t] + b_C) \] \[ C_t = f_t \ast C_{t - 1} + i_t \ast \tilde{C}_t \] \[ h_t = o_