Machine Learning Steered Symbolic Execution Framework for Complex Software Code
Bu Lei,Liang Yongjuan,Xie Zhunyi,Qian Hong,Hu Yi-Qi,Yu Yang,Chen Xin,Li Xuandong
DOI: https://doi.org/10.1007/s00165-021-00538-3
2021-01-01
Formal Aspects of Computing
Abstract:During program traversing, symbolic execution collects pathconditions and feeds them to a constraint solver to obtain feasiblesolutions. However, complex path conditions, like nonlinearconstraints, which widely appear in programs, are hard to be handledefficiently by the existing solvers. In this paper, we adapt theclassical symbolic execution framework with a machine learningapproach for constraint satisfaction. The approach samples andlearns from different solutions to identify potentially feasiblearea. This sampling-learning style solving can be applied indifferent class of complex problems easily. Therefore, incorporatingthis approach, our framework, MLBSE, supports the symbolicexecution of not only simple linear path conditions, but alsononlinear arithmetic operations, and even black-box function callsof library methods. Meanwhile, thanks to the theoretical foundationof the machine learning based approach, when the solver fails tosolve a path condition, we can have an estimation of the confidencein the satisfiability (ECS) of the problem to give users insightsabout how the problem is analyzed and whether they could ultimatelyfind a solution. We implement MLBSE on the basis of SymbolicPath Finder (SPF) into a fully automatic Java symbolic executionengine. Users can feed their code to MLBSE directly, which isvery convenient to use. To evaluate its performance, 22 real caseprograms are used as the benchmarks for MLBSE to generate testcases, which involve a total number of 1042 methods that are full ofnonlinear operations, floating-point arithmetic as well as nativemethod calls. Experiment results show that the coverage achieved byMLBSE is much higher than the state-of-the-art tools.