Abstract:The continuous emergence of malware has threatened to the Android platform and user privacy. With the evolution of the Android system and malware, it is challenging to design a method that can accurately identify the categories of sophisticated malware, including known and unknown families, as well as their obfuscated variants, given that they may be newly emerging and lack available detection knowledge. Although some methods try to use anomaly detection and zero-shot technology to identify unseen applications, they are limited to binary classification or lack the robustness, stability, universality, and interpretability in multi-class identification. To this end, we first propose a generic meta-features mining algorithm, which can discover the potential relationships between samples belonging to the same category. Then we present metaNet, a novel method leveraging meta-features to identify sophisticated Android malware. Specifically, metaNet is mainly powered by four components: (i) mExtractor is a feature collector to obtain the static and dynamic features. (ii) mProcessor is taking unique meta-features of each category from extracted features. (iii) mLearner is a machine learning suite that leverages features and meta-features to design and train a classifier called HSU-Net. (iv) mEnforcer is a flexible deployer that identifies categories of malware families in the real world. We implement a prototype of metaNet with 15K lines of Python code and compare it with state-of-the-art (SOTA) methods. The results show that it can not only achieve superior performance in terms of known families (99.52% of accuracy) and unknown families (99.31% of accuracy trained with 80% known families) for binary classification, but also perform well in multi-class identification, i.e., 99.05% and 93.45% of accuracy for known and unknown families, respectively. Furthermore, we deploy and evaluate metaNet in the real world. It can identify applications over an acceptable time and memory overheads, i.e., average of 11.8s and 56MB per sample with a size of 8MB. Also, the few-shot detection and feature perturbation experiments reflect its robustness and stability benefiting from meta-features. Finally, we collect the traffic of 112 decentralized applications (DApps) belonging to 16 categories, such as finance and health, and evaluated metaNet in DApp identification. The results illustrate its applicability across various tasks. That is, it can accurately classify 94.6% and 81.36% of DApp flows in all-known and 80%-known DApp scenarios, respectively, outperforming the SOTA methods.

Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API.

K-Means Clustering Analysis Based On Adaptive Weights For Malicious Code Detection

Online Clustering of Known and Emerging Malware Families

Android Malware Clustering through Malicious Payload Mining

Malware Analysis Using Machine Learning and Deep Learning Techniques

Malware Classification based on Call Graph Clustering

Metanet: Interpretable Unknown Mobile Malware Identification with a Novel Meta-Features Mining Algorithm

Clustering based opcode graph generation for malware variant detection

Machine Learning based Malware Detection in Cloud Environment using Clustering Approach

Semi-supervised classification for dynamic Android malware detection

Detection of Malicious Software by Analyzing Distinct Artifacts Using Machine Learning and Deep Learning Algorithms

Android Malware Clustering using Community Detection on Android Packages Similarity Network

Semi-supervised Classification of Malware Families Under Extreme Class Imbalance via Hierarchical Non-Negative Matrix Factorization with Automatic Model Selection

NtMalDetect: A Machine Learning Approach to Malware Detection Using Native API System Calls

Discovering Malicious Signatures in Software from Structural Interactions

SCGDet: Malware Detection using Semantic Features Based on Reachability Relation

A Study on the Application of Distributed System Technology-Guided Machine Learning in Malware Detection

MalMixer: Few-Shot Malware Classification with Retrieval-Augmented Semi-Supervised Learning

An Efficient DenseNet-Based Deep Learning Model for Malware Detection

Automatic Malware Description via Attribute Tagging and Similarity Embedding

A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence