Abstract:Android is a growing target for malicious software (malware) because of its popularity and functionality. Malware poses a serious threat to users' privacy, money, equipment and file integrity. A series of data-driven malware detection methods were proposed. However, there exist two key challenges for these methods: (1) how to learn effective feature representation from raw data; (2) how to reduce the dependence on the prior knowledge or human labors in feature learning. Inspired by the success of deep learning methods in the feature representation learning community, we propose a malware detection framework which starts with learning rich-features by a novel unsupervised feature learning algorithm Merged Sparse Auto-Encoder (MSAE). In order to extract more compact and discriminative feature from the rich-features to further boost the malware detection capability, a hybrid deep network learning algorithm Stacked Hybrid Learning MSAE and SDAE (SHLMD) is established by further incorporating a classical deep learning method Stacked Denoising Auto-encoders (SDAE). After that, we feed the feature learned by MSAE and SHLMD respectively to classification algorithms, e.g., Support Vector Machine (SVM) or K-NearestNeighbor (KNN), to train a malware detection model. Evaluation results on two real-world datasets demonstrate that SHLMD achieves 94.46 and 90.57 percent accuracy respectively, which outperforms the classical unsupervised feature representation learning Sparse Auto-encoder (SAE). MSAE performs similarly to SAE. SHLMD can further improve the performance of MSAE and the supervised fine-tuned method SDAE. Besides, we compare the performance of our methods with that of state-of-the-art detection approaches, including classical deep-learning-based methods. Extensive experiments show that our proposed methods are effective enough to detect Android malware.

On the Impact of Sample Duplication in Machine-Learning-Based Android Malware Detection

The Impact of Train-Test Leakage on Machine Learning-based Android Malware Detection

Revisiting Static Feature-Based Android Malware Detection

Dataset Bias in Android Malware Detection

Can We Leverage Predictive Uncertainty to Detect Dataset Shift and Adversarial Examples in Android Malware Detection?

Android HIV: A Study of Repackaging Malware for Evading Machine-Learning Detection

Investigating Feature and Model Importance in Android Malware Detection: An Implemented Survey and Experimental Comparison of ML-Based Methods

A Review of Android Malware Detection Approaches Based on Machine Learning

Malware Collusion Attack Against Machine Learning Based Methods: Issues and Countermeasures

Unraveling the Key of Machine Learning Solutions for Android Malware Detection

Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection

Decoding the Secrets of Machine Learning in Malware Classification: A Deep Dive into Datasets, Feature Extraction, and Model Performance

SynDroid: An adaptive enhanced Android malware classification method based on CTGAN-SVM

On Impact of Semantically Similar Apps in Android Malware Datasets

The Adverse Effects of Code Duplication in Machine Learning Models of Code

Android Malware Detection with Unbiased Confidence Guarantees

Automated Poisoning Attacks and Defenses in Malware Detection Systems: An Adversarial Machine Learning Approach

A Hybrid Deep Network Framework for Android Malware Detection

Effective and Explainable Detection of Android Malware Based on Machine Learning Algorithms

MalCertain: Enhancing Deep Neural Network Based Android Malware Detection by Tackling Prediction Uncertainty

Experiences of Landing Machine Learning Onto Market-Scale Mobile Malware Detection.