Abstract:Symbolic execution is an automated test input generation technique that models individual program paths as logical constraints. However, the realism of concrete test inputs generated by SMT solvers often comes into question. Existing symbolic execution tools only seek arbitrary solutions for given path constraints. These constraints do not incorporate the naturalness of inputs that observe statistical distributions, range constraints, or preferred string constants. This results in unnatural-looking inputs that fail to emulate real-world data. In this paper, we extend symbolic execution with consideration for incorporating naturalness. Our key insight is that users typically understand the semantics of program inputs, such as the distribution of height or possible values of zipcode, which can be leveraged to advance the ability of symbolic execution to produce natural test inputs. We instantiate this idea in NaturalSym, a symbolic execution-based test generation tool for data-intensive scalable computing (DISC) applications. NaturalSym generates natural-looking data that mimics real-world distributions by utilizing user-provided input semantics to drastically enhance the naturalness of inputs, while preserving strong bug-finding potential. On DISC applications and commercial big data test benchmarks, NaturalSym achieves a higher degree of realism —as evidenced by a perplexity score 35.1 points lower on median, and detects 1.29× injected faults compared to the state-of-the-art symbolic executor for DISC, BigTest. This is because BigTest draws inputs purely based on the satisfiability of path constraints constructed from branch predicates, while NaturalSym is able to draw natural concrete values based on user-specified semantics and prioritize using these values in input generation. Our empirical results demonstrate that NaturalSym finds injected faults 47.8× more than NaturalFuzz (a coverage-guided fuzzer) and 19.1× more than ChatGPT. Meanwhile, TestMiner (a mining-based approach) fails to detect any injected faults. NaturalSym is the first symbolic executor that combines the notion of input naturalness in symbolic path constraints during SMT-based input generation. We make our code available at https://github.com/UCLA-SEAL/NaturalSym.

On Benchmarking the Capability of Symbolic Execution Tools with Logic Bombs

EXAMINER-PRO: Testing Arm Emulators Across Different Privileges

Symbolic Execution with Test Cases Generated by Large Language Models

Steering Symbolic Execution to Less Traveled Paths

Symbolic execution of floating-point programs: How far are we?

Boosting Symbolic Execution Via Constraint Solving Time Prediction (experience Paper)

Concolic Execution on Small-Size Binaries: Challenges and Empirical Study

Divide, Conquer and Verify: Improving Symbolic Execution Performance

Natural Symbolic Execution-Based Testing for Big Data Analytics

Python Symbolic Execution with LLM-powered Code Generation

Pbse: Phase-Based Symbolic Execution

Design and Implementation of a Dynamic Symbolic Execution Tool for Windows Executables.

Experience report: how is dynamic symbolic execution different from manual testing? a study on KLEE

Distributed Symbolic Execution for Binary Software Testing

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Characteristic Studies of Loop Problems for Structural Test Generation Via Symbolic Execution

Is Function Similarity Over-Engineered? Building a Benchmark

State of the art: Dynamic symbolic execution for automated test generation

Selective Symbolization Based Efficient Symbolic Execution.

Dependence Guided Symbolic Execution.

Crashmaker