WeChecker: efficient and precise detection of privilege escalation vulnerabilities in Android apps.

Xingmin Cui,Jingxuan Wang,Lucas Chi Kwong Hui,Zhongwei Xie,Tian Zeng,Siu-Ming Yiu
DOI: https://doi.org/10.1145/2766498.2766509
2015-01-01
Abstract:Due to the rapid increase of Android apps and their wide usage to handle personal data, a precise and large-scaling checker is in need to validate the apps' permission flow before they are listed on the market. Several tools have been proposed to detect sensitive data leaks in Android apps. But these tools are not applicable to large-scale analysis since they fail to deal with the arbitrary execution orders of different event handlers smartly. Event handlers are invoked by the framework based on the system state, therefore we cannot pre-determine their order of execution. Besides, since all exported components can be invoked by an external app, the execution orders of these components are also arbitrary. A naive way to simulate these two types of arbitrary execution orders yields a permutation of all event handlers in an app. The time complexity is O(n!) where n is the number of event handlers in an app. This leads to a high analysis overhead when n is big. To give an illustration, CHEX [10] found 50.73 entry points of 44 unique class types in an app on average. In this paper we propose an improved static taint analysis to deal with the challenge brought by the arbitrary execution orders without sacrificing the high precision. Our analysis does not need to make permutations and achieves a polynomial time complexity. We also propose to unify the array and map access with object reference by propagating access paths to reduce the number of false positives due to field-insensitivity and over approximation of array access and map access. We implement a tool, WeChecker, to detect privilege escalation vulnerabilities [7] in Android apps. WeChecker achieves 96% precision and 96% recall in the state-of-the-art test suite DriodBench (for compairson, the precision and recall of FlowDroid [1] are 86% and 93%, respectively). The evaluation of WeChecker on real apps shows that it is efficient (average analysis time of each app: 29.985s) and fits for large-scale checking.
What problem does this paper attempt to address?