Scaling static taint analysis to industrial SOA applications: a case study at Alibaba

Jie Wang,Yunguang Wu,Gang Zhou,Yiming Yu,Zhenyu Guo,Yingfei Xiong
DOI: https://doi.org/10.1145/3368089.3417059
2020-11-07
Abstract:In Alibaba, we have seen a growing demand for tracing data flow for scenarios such as data leak detection, change governance, and data consistency checking. Static taint analysis is a technique for such problems, and many approaches are proposed for high scalability and precision. This paper shares our experience in applying taint analysis in Alibaba. In particular, we find that the state-of-the-art taint analysis tool, FlowDroid, does not work well in our cases because our applications make heavy use of libraries, native methods and enterprise-specific frameworks, which impose two major challenges, scalability and implicit dependency, to FlowDroid. This paper presents ANTaint to address these problems. ANTaint improves scalability by expanding the call graph and applying taint propagation on demand for libraries, which account for majority of the program execution but only a small fraction propagates taints. To improve accuracy, we ensure to build a sound call graph with its core part having certain accuracy, and providing a more precise taint propagation model. The practice of applying ANTaint in the company workload validates the idea. According to an experiment on 60 production cases, ANTaint is correct for 95% of the cases (precision: 95%, recall: 98%) while FlowDroid is 13%. ANTaint takes 65% less time and none of the cases run out of memory with 32 GB limitation.
What problem does this paper attempt to address?