Symbolic Execution with Test Cases Generated by Large Language Models

Jiahe Xu,Jingwei Xu,Taolue Chen,Xiaoxing Ma
DOI: https://doi.org/10.1109/qrs62785.2024.00031
2024-01-01
Abstract:Symbolic execution is a powerful program analysis technique. External environment construction and internal path explosion are two long-standing problems which may affect the effectiveness and performance of symbolic execution on complex programs. The intrinsic challenge is to achieve a sufficient understanding of the program context to construct a set of execution environments which can guide the selection of symbolic states. In this paper, we propose a novel program-context-guided symbolic execution framework LangSym based on program’s instruction/user manual. Leveraging the capabilities of natural language understanding and code generation in large language models (LLMs), LangSym can automatically extract the knowledge related to the functionality of the program, and generate adequate test cases and the corresponding environments as the prior knowledge for symbolic execution. We instantiate LangSym in KLEE, a widely adopted symbolic execution engine, to build a pipeline that could automatically leverage LLMs to boost the symbolic execution. We evaluate LangSym on almost all GNU Coreutils programs and considerable large-scale programs, showing that LangSym outperforms the existing strategies in KLEE with at least a 10% increase for line coverage.
What problem does this paper attempt to address?