Abstract:The impressive growth of smartphone devices in combination with the rising ubiquity of using mobile platforms for sensitive applications such as Internet banking, have triggered a rapid increase in mobile malware. In recent literature, many studies examine Machine Learning techniques, as the most promising approach for mobile malware detection, without however quantifying the uncertainty involved in their detections. In this paper, we address this problem by proposing a machine learning dynamic analysis approach that provides provably valid confidence guarantees in each malware detection. Moreover the particular guarantees hold for both the malicious and benign classes independently and are unaffected by any bias in the data. The proposed approach is based on a novel machine learning framework, called Conformal Prediction, combined with a random forests classifier. We examine its performance on a large-scale dataset collected by installing 1866 malicious and 4816 benign applications on a real android device. We make this collection of dynamic analysis data available to the research community. The obtained experimental results demonstrate the empirical validity, usefulness and unbiased nature of the outputs produced by the proposed approach.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve an important problem in mobile malware detection: **the lack of quantification of confidence in detection results**. Specifically, although many existing studies use machine - learning techniques to detect malware on the Android platform, they do not provide a method to quantify the uncertainty of their detection results. #### Main problems: 1. **Quantification of uncertainty**: - Existing malware detection methods fail to provide a reliable confidence guarantee for each detection result. - This makes it difficult for users to assess the reliability of the detection results, thus affecting their decisions (for example, whether to delete an application). 2. **Class imbalance problem**: - Malware detection data usually has a serious class imbalance problem (that is, there are far more benign applications than malicious applications). - This causes existing methods to be likely to be biased towards the prediction of the benign category when detecting malware, thus reducing the accuracy of malware detection. #### Solutions: To address these problems, the author proposes a dynamic analysis method based on the **Conformal Prediction (CP)** framework, combined with the Random Forests classifier, to provide **unbiased confidence guarantees**. Specific improvements include: - **Label - conditional Mondrian Conformal Prediction (LCMCP)**: Ensure that the confidence guarantee is valid for the malicious and benign categories respectively, without being affected by data bias. - **Inductive Conformal Prediction (ICP)**: Reduce the computational complexity, making it suitable for resource - constrained environments such as mobile devices. - **Verification on large - scale real - data sets**: Verify the effectiveness and practicality of the proposed method through a data set collected on a real Android device containing 6,682 applications. #### Goals: 1. **Provide stronger within - class confidence guarantees**: Ensure that effective confidence guarantees are provided for malicious and benign instances respectively, avoiding bias caused by class imbalance. 2. **Evaluate performance**: Evaluate the performance of the proposed method on a large - scale real - world data set and show its advantages over the traditional Random Forests classifier. Through these improvements, the author hopes to provide users with a more reliable malware detection tool, enabling users to make better decisions based on the confidence of the detection results.

Android Malware Detection with Unbiased Confidence Guarantees

Dynamic detection of mobile malware using smartphone data and machine learning

A New Android Malware Detection Approach Using Bayesian Classification

Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection

Investigating Feature and Model Importance in Android Malware Detection: An Implemented Survey and Experimental Comparison of ML-Based Methods

Accurate mobile malware detection and classification in the cloud

Semi-supervised classification for dynamic Android malware detection

A System Call-based Android Malware Detection Approach with Homogeneous & Heterogeneous Ensemble Machine Learning

A machine learning approach to anomaly-based detection on Android platforms

Adaptive and Scalable Android Malware Detection through Online Learning

Analysis of Bayesian Classification based Approaches for Android Malware Detection

Can We Leverage Predictive Uncertainty to Detect Dataset Shift and Adversarial Examples in Android Malware Detection?

Novel Multi-Classification Dynamic Detection Model for Android Malware Based on Improved Zebra Optimization Algorithm and LightGBM

Android Malware Detection Based on a Hybrid Deep Learning Model

A Review of Android Malware Detection Approaches Based on Machine Learning

Android Malware Category and Family Detection and Identification using Machine Learning

DL-Droid: Deep learning based android malware detection using real devices

Unraveling the Key of Machine Learning Solutions for Android Malware Detection

MalCertain: Enhancing Deep Neural Network Based Android Malware Detection by Tackling Prediction Uncertainty

Evaluation of Machine Learning Algorithms for Malware Detection

Revisiting Static Feature-Based Android Malware Detection