Abstract:Search-based software testing (SBST) is a widely adopted technique for testing complex systems with large input spaces, such as Deep Learning-enabled (DL-enabled) systems. Many SBST techniques focus on Pareto-based optimization, where multiple objectives are optimized in parallel to reveal failures. However, it is important to ensure that identified failures are spread throughout the entire failure-inducing area of a search domain and not clustered in a sub-region. This ensures that identified failures are semantically diverse and reveal a wide range of underlying causes. In this paper, we present a theoretical argument explaining why testing based on Pareto optimization is inadequate for covering failure-inducing areas within a search domain. We support our argument with empirical results obtained by applying two widely used types of Pareto-based optimization techniques, namely NSGA-II (an evolutionary algorithm) and OMOPSO (a swarm-based Pareto-optimization algorithm), to two DL-enabled systems: an industrial Automated Valet Parking (AVP) system and a system for classifying handwritten digits. We measure the coverage of failure-revealing test inputs in the input space using a metric that we refer to as the Coverage Inverted Distance quality indicator. Our results show that NSGA-II-based search and OMOPSO are not more effective than a naïve random search baseline in covering test inputs that reveal failures. The replication package for this study is available in a GitHub repository.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **Can Pareto - based Search - Based Testing (SBST) effectively cover test inputs that reveal faults?** Specifically, the authors focus on: 1. **The application problem of SBST in complex systems**: In particular, deep - learning - enabled (DL - enabled) systems, such as Automated Valet Parking (AVP) and handwritten digit recognition systems. The input spaces of these systems are very large and complex, and traditional testing methods are difficult to effectively cover all possible fault - revealing test inputs. 2. **The effectiveness problem of Pareto optimization**: Many SBST techniques rely on Pareto optimization to optimize multiple objective functions simultaneously to reveal faults in the system. However, existing research has not fully explored whether these techniques can achieve effective coverage within the entire fault - inducing region, rather than just concentrating on certain sub - regions. 3. **The relationship problem between diversity and coverage**: An ideal testing method should be able to identify diverse fault - revealing test inputs distributed within the fault - inducing region, rather than concentrating on a specific sub - region. This helps to discover more diverse fault causes and conditions. To answer these questions, the authors carried out the following work: - **Theoretical demonstration**: By defining the SBST problem and proposing two hypotheses (A1 and A2), explain why the Pareto - based SBST algorithm cannot achieve high coverage within the entire fault - revealing test input space. - **Empirical research**: Use two widely - used Pareto - optimization algorithms (NSGA - II and OMOPSO) to conduct experiments on two DL - enabled systems, measure their performance in covering fault - revealing test inputs, and compare them with a random - search baseline. - **Introduce a new metric**: Propose a new metric - Coverage Inverted Distance (CID) - to evaluate the performance of different algorithms in covering the fault - inducing region. Finally, the experimental results of the authors show that although NSGA - II and OMOPSO can improve the coverage in some cases by introducing diverse fitness functions and re - seeding operators, they still cannot outperform the simple random - search method. This indicates that the existing Pareto - based SBST techniques have limitations in covering fault - revealing test inputs. ### Key formulas - **Fitness function vector**: \[ F: D \mapsto \mathbb{R}^m, \quad F(x)=(f_1(x), f_2(x), \ldots, f_m(x)) \] where \( f_i \) is a scalar fitness function and \( D \subseteq \mathbb{R}^n \) is the search domain. - **Pareto dominance relationship**: \[ x \text{ dominates } x' \iff \exists v_i. (v_i < v'_i) \land \forall v_j. (v_j \leq v'_j) \] - **Coverage Inverted Distance (CID)**: \[ \text{CID}=\frac{1}{|R|} \sum_{r \in R} \min_{s \in S} d(r, s) \] where \( R \) is the reference set, \( S \) is the solution set, and \( d(r, s) \) is the distance between points \( r \) and \( s \). Through these works, the authors reveal the deficiencies of Pareto - based SBST techniques in covering fault - revealing test inputs and provide new directions for future research.

Can Search-Based Testing with Pareto Optimization Effectively Cover Failure-Revealing Test Inputs?

Enhancing Performance of Random Testing Through Markov Chain Monte Carlo Methods

Reflections on Surrogate-Assisted Search-Based Testing: A Taxonomy and Two Replication Studies based on Industrial ADAS and Simulink Models

Search-based Software Testing Driven by Automatically Generated and Manually Defined Fitness Functions

Guiding the Search Towards Failure-Inducing Test Inputs Using Support Vector Machines

An extensive evaluation of search-based software testing: a review

Search-Based Cost-Effective Test Case Selection Within A Time Budget: an Empirical Study

What Not to Test (for Cyber-Physical Systems)

Instance Space Analysis of Search-Based Software Testing

Evaluating Search-Based Software Microbenchmark Prioritization

Search Based Combinatorial Testing

Improving Failure Detection by Automatically Generating Test Cases Near the Boundaries.

Testing of Deep Reinforcement Learning Agents with Surrogate Models

Adaptive Failure Search Using Critical States from Domain Experts

DeepEvolution: A Search-Based Testing Approach for Deep Neural Networks

How good does a Defect Predictor need to be to guide Search-Based Software Testing?

Search-Based Selection and Prioritization of Test Scenarios for Autonomous Driving Systems.

Efficient and Effective Generation of Test Cases for Pedestrian Detection -- Search-based Software Testing of Baidu Apollo in SVL

Search-Based Software Test Data Generation for Path Coverage Based on a Feedback-Directed Mechanism

Bayesian Safety Validation for Failure Probability Estimation of Black-Box Systems