Automated Third-party Library Detection for Android Applications: Are We There Yet?Experience

Zhan Xian,Lingling Fan,Tianming Liu,Sen Chen,Li Li,Haoyu Wang,Yifei Xu,Xiapu Luo,Yang Liu
2020-01-01
Abstract:Third-party libraries (TPLs) have become a significant part of the Android ecosystem. Developers can employ various TPLs with different functionalities to facilitate their app development. Unfortunately, the popularity of TPLs also brings new challenges and even threats. TPLs may carry malicious or vulnerable code and can infect many popular apps to pose threats to mobile users. Besides, the code of third-party libraries could constitute noises in some detection tasks. Thus, researchers have developed various tools to identify TPLs. However, no existing work has studied these TPL detection tools in detail; different tools focus on different applications with performance differences, so little is known about them. To better understand existing TPL detection tools and dissect TPL detection techniques, we conduct an experience paper and attempt to fill the gap by evaluating and comparing all publicly available TPL detection tools based on four criteria: effectiveness, efficiency, code obfuscation-resilience capability, and ease of use. We reveal their advantages and disadvantages based on our empirical study. The result shows that most TPL detection tools can achieve high precision but with low recall. According to our evaluation and survey results, we recommend different tools for different application scenarios. We find that LibRadar is suitable for large-scale in-app TPL detection. LibPecker is ideal for identifying obfuscated TPLs. LibScout can identify specific library versions, which can be leveraged to find vulnerable TPLs, etc. Besides, we enhance these open-sourced tools by fixing their limitations, to improve their detection ability. We also build an extensible framework that integrates all existing available TPL detection tools, providing online service for the research community. We make publicly available the evaluation dataset and enhanced tools. We believe our work provides a clear picture of existing TPL detection techniques and also give a road-map for future directions.
What problem does this paper attempt to address?