Outsmarting Android Malware with Cutting-Edge Feature Engineering and Machine Learning Techniques

Ahsan Wajahat,Jingsha He,Nafei Zhu,Tariq Mahmood,Tanzila Saba,Amjad Rehman Khan,Faten S. Alamri
DOI: https://doi.org/10.32604/cmc.2024.047530
2024-01-01
Abstract:The growing usage of Android smartphones has led to a significant rise in incidents of Android malware and privacy breaches.This escalating security concern necessitates the development of advanced technologies capable of automatically detecting and mitigating malicious activities in Android applications (apps).Such technologies are crucial for safeguarding user data and maintaining the integrity of mobile devices in an increasingly digital world.Current methods employed to detect sensitive data leaks in Android apps are hampered by two major limitations they require substantial computational resources and are prone to a high frequency of false positives.This means that while attempting to identify security breaches, these methods often consume considerable processing power and mistakenly flag benign activities as malicious, leading to inefficiencies and reduced reliability in malware detection.The proposed approach includes a data preprocessing step that removes duplicate samples, manages unbalanced datasets, corrects inconsistencies, and imputes missing values to ensure data accuracy.The Minimax method is then used to normalize numerical data, followed by feature vector extraction using the Gain ratio and Chi-squared test to identify and extract the most significant characteristics using an appropriate prediction model.This study focuses on extracting a subset of attributes best suited for the task and recommending a predictive model based on domain expert opinion.The proposed method is evaluated using Drebin and TUANDROMD datasets containing 15,036 and 4,464 benign and malicious samples, respectively.The empirical result shows that the Random Forest (RF) and Support Vector Machine (SVC) classifiers achieved impressive accuracy rates of 98.9% and 98.8%, respectively, in detecting unknown Android malware.A sensitivity analysis experiment was also carried out on all three ML-based classifiers based on MAE, MSE, R 2 , and sensitivity parameters, resulting in a flawless performance for both datasets.This approach has substantial potential for real-world applications and can serve as a valuable tool for preventing the spread of Android malware and enhancing mobile device security.
What problem does this paper attempt to address?