Abstract:Machine learning has shown promise for improving the accuracy of Android malware detection in the literature. However, it is challenging to (1) stay robust towards real-world scenarios and (2) provide interpretable explanations for experts to analyse. In this article, we propose MsDroid, an Android malware detection system that makes decisions by identifying malicious snippets with interpretable explanations. We mimic a common practice of security analysts, i.e., filtering APIs before looking through each method, to focus on local snippets around sensitive APIs instead of the whole program. Each snippet is represented with a graph encoding both code attributes and domain knowledge and then classified by Graph Neural Network (GNN). The local perspective helps the GNN classifier to concentrate on code highly correlated with malicious behaviors, and the information contained in graphs benefit in better understanding of the behaviors. Hence, MsDroid is more robust and interpretable in nature. To identify malicious snippets, we present a semi-supervised learning approach that only requires app labeling. The key insight is that malicious snippets only exist in malwares and appear at least once in a malware. To make malicious snippets less opaque, we design an explanation mechanism to show the importance of control flows and to retrieve similarly implemented snippets from known malwares. A comprehensive comparison with 5 baseline methods is conducted on a dataset of more than 81K apps in 3 real-world scenarios, including zero-day, evolution, and obfuscation. The experimental results show that MsDroid is more robust than state-of-the-art systems in all cases, with 5.37% to 49.52% advantage in F1-score. Besides, we demonstrate that the provided explanations are effective and illustrate how the explanations facilitate malware analysis.

LSCDroid: Malware Detection Based on Local Sensitive API Invocation Sequences

LSTM Android Malicious Behavior Analysis Based on Feature Weighting

Android Malware Detection Based on System Call Sequences and LSTM

An Android Malware Detection Method Using Deep Learning Based on API Calls

Droidetec: Android Malware Detection and Malicious Code Localization through Deep Learning

An Efficient Android Malware Detection System Based on Method-Level Behavioral Semantic Analysis.

Android Malware Detection using Deep Learning on API Method Sequences

Hybrid Sequence‐based Android Malware Detection Using Natural Language Processing

Detection of Malicious Behavior in Android Apps Through API Calls and Permission Uses Analysis.

Cscdroid: Accurately Detect Android Malware Via Contribution-Level-Based System Call Categorization

Detecting Android Malware Based on Dynamic Feature Sequence and Attention Mechanism

Machine Learning-Based Malicious Application Detection of Android

API Sequences Based Malware Detection for Android.

A Deep Learning Approach To Android Malware Feature Learning And Detection

MsDroid: Identifying Malicious Snippets for Android Malware Detection

An End-to-end Model for Android Malware Detection

ImageDroid: Using Deep Learning to Efficiently Detect Android Malware and Automatically Mark Malicious Features

OpCode-Level Function Call Graph Based Android Malware Classification Using Deep Learning

A Detection Method and System Implementation for Android Malware

A Combination Method for Android Malware Detection Based on Control Flow Graphs and Machine Learning Algorithms.

Droiddetector: Android Malware Characterization and Detection Using Deep Learning