An android malware detection approach using multi-feature fusion and TF-IDF algorithm

Haipeng Zhang,Shudong Li,Xiaobo Wu,Weihong Han,Laifu Wang
DOI: https://doi.org/10.1109/DSC53577.2021.00100
2021-01-01
Abstract:With the popularity of smartphones, android quickly occupied the market, and at the same time, millions of android malware increase every year, which caused great losses to users. so it's crucial to detect unknown android malware efficiently before install it. Machine learning has achieved outstanding performance in many fields, and so does android malware detection. Nowadays, many methods of detecting android malware based on machine learning have been proposed. These methods can be classified as dynamic analysis, static analysis, and hybrid analysis according to what kind of features they use. the dynamic analysis will waste a lot of time because it requires running software. What's more, it's hard to get all malicious behaviors because of Anti-Sandbox technology, which means malware might change its behavior to avoid detecting when malware is running in Sandbox. In this paper, we proposed a new android malware detection approach using multi-features and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$TF-IDF$</tex> algorithm. The static features we use cover permissions and API calls of each android application and the size of the android application package. We used BOW(bag of words) to process permission feature, standardization to process packages size feature, and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$TF-IDF$</tex> algorithm to handle API calls features. Machine-learning algorithms including AdaBoost, LinearSVC, GaussianNB are used for Classification. The experimental results on public datasets demonstrated our system could detect android malware efficiently.
What problem does this paper attempt to address?