Active learning framework for android unknown malware detection

Hua Zhu
DOI: https://doi.org/10.1201/9781315210445-63
2017-06-26
Abstract:There are a lot of unknown labels Android examples in real-world applications, and they will cost much to mark manually. In this paper, we propose an active learning framework to solve this problem, so that Android malware are detected. In the active learning framework, Naive Bayes (NB), Decision Tree (DT), Logistic Regression (LR) and Support Vector Machines (SVM) are used to mark the labels for Android examples. The results indicate that this approach is effective to detect Android malware. Active learning model mainly imitates people to learn. It is through a certain method to extract the most informative data for a sample, and then the labels are manual marked on these selected sample data, which can train classifier with more abundant information. Dynamic detection works through monitoring the execution of Android malware activity at runtime, which is used in Crowdroid. Because Android malicious applications grow rapidly and emerge into various kinds, traditional detection methods cannot identify new unknown malicious applications. It is necessary to apply some mechanism based on active learning framework to detect new unknown malicious applications and ensure high efficiency and accuracy. In order to mitigate the threat on mobile device, various efforts have been made to detect and analyze malicious applications. There are three main approaches: static analysis, dynamic analysis and hybrid analysis. Static analysis approach is implemented through the source code without the execution of Android malware, which is used by W. Enck et al.
What problem does this paper attempt to address?