LibAlchemy: A Two-Layer Persistent Summary Design for Taming Third-Party Libraries in Static Bug-Finding Systems

Rongxin Wu,Yuxuan He,Jiafeng Huang,Chengpeng Wang,Wensheng Tang,Qingkai Shi,Xiao,Charles Zhang
DOI: https://doi.org/10.1145/3597503.3639132
2024-01-01
Abstract:Despite the benefits of using third-party libraries (TPLs), the misuse of TPL functions raises quality and security concerns. Using traditional static analysis to detect bugs caused by TPL function is nontrivial. One promising solution would be to automatically generate and persist the summaries of TPL functions offline and then reuse these summaries in compositional static analysis online. However, when dealing with millions of lines of TPL code, the summaries designed by existing studies suffer from an unresolved paradox. That is, a highly precise form of summary leads to an unaffordable space and time overhead, while an imprecise one seriously hurts its precision or recall. To address the paradox, we propose a novel two-layer summary design. The first layer utilizes a line-sized program representation known as the program dependence graph to compactly encode path conditions, while the second layer encodes bug-type-specific properties. We implemented our idea as a tool called Libalchemy and evaluated it on fifteen mature and extensively checked open-source projects. Experimental results show that Libalchemy can check over ten million lines of code within ten hours. Libalchemy has detected 55 true bugs with a high precision of 90.16%, eleven of which have been assigned CVE IDs. Compared to whole-program analysis and the conventional design of path-sensitively precise summaries, Libalchemy achieves an 18.56× and 12.77× speedup and saves 91.49% and 90.51% of memory usage, respectively.
What problem does this paper attempt to address?