A Systematic Assessment on Android Third-party Library Detection Tools
Xian Zhan,Tianming Liu,Yepang Liu,Yang Liu,Li Li,Haoyu Wang,Xiapu Luo
DOI: https://doi.org/10.1109/tse.2021.3115506
IF: 7.4
2021-01-01
IEEE Transactions on Software Engineering
Abstract:Third-party libraries (TPLs) have become a significant part of the Android ecosystem. Developers can employ various TPLs to facilitate their app development. Unfortunately, the popularity of TPLs also brings new security issues. For example, TPLs may carry malicious or vulnerable code, which can infect popular apps to pose threats to mobile users. Furthermore, TPL detection is essential for downstream tasks, such as vulnerabilities and malware detection. Thus, various tools have been developed to identify TPLs. However, no existing work has studied these TPL detection tools in detail, and different tools focus on different applications and techniques with performance differences. A comprehensive understanding of these tools will help us make better use of them. To this end, we conduct a comprehensive empirical study to fill the gap by evaluating and comparing all publicly available TPL detection tools based on six criteria: accuracy of TPL construction, effectiveness, efficiency, accuracy of version identification, resiliency to code obfuscation, and ease of use. Besides, we enhance these open-source tools by fixing their limitations, to improve their detection ability. Finally, we build an extensible framework that integrates all existing available TPL detection tools, providing an online service for the research community. We release the evaluation dataset and enhanced tools. According to our study, we also present the essential findings and discuss promising implications to the community; e.g., 1) Most existing TPL detection techniques more or less depend on package structure to construct in-app TPL candidates. However, using package structure as the module decoupling feature is error-prone. We hence suggest future researchers using the class dependency to substitute package structure. 2) Extracted features include richer semantic information (e.g., class dependencies) can achieve better resiliency to code obfuscation. 3) Existing tools usually have - low recall; that is because previous tools ignore some features of Android apps and TPLs, such as the compilation mechanism, the new format of TPLs, TPL dependency. Most existing tools cannot effectively find partial import TPLs, obfuscated TPLs, which directly limit their capability. 4) Existing tools are complementary to each other; we can build a better tool via combining the advantages of each tool. We believe our work provides a clear picture of existing TPL detection techniques and also gives a road-map for future research.
engineering, electrical & electronic,computer science, software engineering