Abstract:Symbolic execution is an automated test input generation technique that models individual program paths as logical constraints. However, the realism of concrete test inputs generated by SMT solvers often comes into question. Existing symbolic execution tools only seek arbitrary solutions for given path constraints. These constraints do not incorporate the naturalness of inputs that observe statistical distributions, range constraints, or preferred string constants. This results in unnatural-looking inputs that fail to emulate real-world data. In this paper, we extend symbolic execution with consideration for incorporating naturalness. Our key insight is that users typically understand the semantics of program inputs, such as the distribution of height or possible values of zipcode, which can be leveraged to advance the ability of symbolic execution to produce natural test inputs. We instantiate this idea in NaturalSym, a symbolic execution-based test generation tool for data-intensive scalable computing (DISC) applications. NaturalSym generates natural-looking data that mimics real-world distributions by utilizing user-provided input semantics to drastically enhance the naturalness of inputs, while preserving strong bug-finding potential. On DISC applications and commercial big data test benchmarks, NaturalSym achieves a higher degree of realism —as evidenced by a perplexity score 35.1 points lower on median, and detects 1.29× injected faults compared to the state-of-the-art symbolic executor for DISC, BigTest. This is because BigTest draws inputs purely based on the satisfiability of path constraints constructed from branch predicates, while NaturalSym is able to draw natural concrete values based on user-specified semantics and prioritize using these values in input generation. Our empirical results demonstrate that NaturalSym finds injected faults 47.8× more than NaturalFuzz (a coverage-guided fuzzer) and 19.1× more than ChatGPT. Meanwhile, TestMiner (a mining-based approach) fails to detect any injected faults. NaturalSym is the first symbolic executor that combines the notion of input naturalness in symbolic path constraints during SMT-based input generation. We make our code available at https://github.com/UCLA-SEAL/NaturalSym.

State of the art: Dynamic symbolic execution for automated test generation

Effective code coverage in compositional systematic dynamic testing

Design and Implementation of a Dynamic Symbolic Execution Tool for Windows Executables.

Characteristic Studies of Loop Problems for Structural Test Generation Via Symbolic Execution

Engineering Seminar ( WiSe 2020 / 21 ) Search Strategies in Concolic Testing Description

Dyta: Dynamic Symbolic Execution Guided With Static Verification Results

Steering Symbolic Execution to Less Traveled Paths

Experience report: how is dynamic symbolic execution different from manual testing? a study on KLEE

Python Symbolic Execution with LLM-powered Code Generation

Database state generation via dynamic symbolic execution for coverage criteria.

Natural Symbolic Execution-Based Testing for Big Data Analytics

Android Testing Via Synthetic Symbolic Execution.

SCSE: Boosting Symbolic Execution Via State Concretization

Towards Symbolic Pointers Reasoning in Dynamic Symbolic Execution

Combining Dynamic Symbolic Execution, Machine Learning and Search-Based Testing to Automatically Generate Test Cases for Classes

Guided Test Generation for Database Applications Via Synthesized Database Interactions.

Dependence Guided Symbolic Execution.

Speculative Symbolic Execution

Machine Learning Steered Symbolic Execution Framework for Complex Software Code

Divide, Conquer and Verify: Improving Symbolic Execution Performance

Symbolic Execution with Test Cases Generated by Large Language Models