Can Search-Based Testing with Pareto Optimization Effectively Cover Failure-Revealing Test Inputs?

Lev Sorokin,Damir Safin,Shiva Nejati
2024-10-16
Abstract:Search-based software testing (SBST) is a widely adopted technique for testing complex systems with large input spaces, such as Deep Learning-enabled (DL-enabled) systems. Many SBST techniques focus on Pareto-based optimization, where multiple objectives are optimized in parallel to reveal failures. However, it is important to ensure that identified failures are spread throughout the entire failure-inducing area of a search domain and not clustered in a sub-region. This ensures that identified failures are semantically diverse and reveal a wide range of underlying causes. In this paper, we present a theoretical argument explaining why testing based on Pareto optimization is inadequate for covering failure-inducing areas within a search domain. We support our argument with empirical results obtained by applying two widely used types of Pareto-based optimization techniques, namely NSGA-II (an evolutionary algorithm) and OMOPSO (a swarm-based Pareto-optimization algorithm), to two DL-enabled systems: an industrial Automated Valet Parking (AVP) system and a system for classifying handwritten digits. We measure the coverage of failure-revealing test inputs in the input space using a metric that we refer to as the Coverage Inverted Distance quality indicator. Our results show that NSGA-II-based search and OMOPSO are not more effective than a naïve random search baseline in covering test inputs that reveal failures. The replication package for this study is available in a GitHub repository.
Software Engineering,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **Can Pareto - based Search - Based Testing (SBST) effectively cover test inputs that reveal faults?** Specifically, the authors focus on: 1. **The application problem of SBST in complex systems**: In particular, deep - learning - enabled (DL - enabled) systems, such as Automated Valet Parking (AVP) and handwritten digit recognition systems. The input spaces of these systems are very large and complex, and traditional testing methods are difficult to effectively cover all possible fault - revealing test inputs. 2. **The effectiveness problem of Pareto optimization**: Many SBST techniques rely on Pareto optimization to optimize multiple objective functions simultaneously to reveal faults in the system. However, existing research has not fully explored whether these techniques can achieve effective coverage within the entire fault - inducing region, rather than just concentrating on certain sub - regions. 3. **The relationship problem between diversity and coverage**: An ideal testing method should be able to identify diverse fault - revealing test inputs distributed within the fault - inducing region, rather than concentrating on a specific sub - region. This helps to discover more diverse fault causes and conditions. To answer these questions, the authors carried out the following work: - **Theoretical demonstration**: By defining the SBST problem and proposing two hypotheses (A1 and A2), explain why the Pareto - based SBST algorithm cannot achieve high coverage within the entire fault - revealing test input space. - **Empirical research**: Use two widely - used Pareto - optimization algorithms (NSGA - II and OMOPSO) to conduct experiments on two DL - enabled systems, measure their performance in covering fault - revealing test inputs, and compare them with a random - search baseline. - **Introduce a new metric**: Propose a new metric - Coverage Inverted Distance (CID) - to evaluate the performance of different algorithms in covering the fault - inducing region. Finally, the experimental results of the authors show that although NSGA - II and OMOPSO can improve the coverage in some cases by introducing diverse fitness functions and re - seeding operators, they still cannot outperform the simple random - search method. This indicates that the existing Pareto - based SBST techniques have limitations in covering fault - revealing test inputs. ### Key formulas - **Fitness function vector**: \[ F: D \mapsto \mathbb{R}^m, \quad F(x)=(f_1(x), f_2(x), \ldots, f_m(x)) \] where \( f_i \) is a scalar fitness function and \( D \subseteq \mathbb{R}^n \) is the search domain. - **Pareto dominance relationship**: \[ x \text{ dominates } x' \iff \exists v_i. (v_i < v'_i) \land \forall v_j. (v_j \leq v'_j) \] - **Coverage Inverted Distance (CID)**: \[ \text{CID}=\frac{1}{|R|} \sum_{r \in R} \min_{s \in S} d(r, s) \] where \( R \) is the reference set, \( S \) is the solution set, and \( d(r, s) \) is the distance between points \( r \) and \( s \). Through these works, the authors reveal the deficiencies of Pareto - based SBST techniques in covering fault - revealing test inputs and provide new directions for future research.