LEAP: Efficient and Automated Test Method for NLP Software

Mingxuan Xiao,Yan Xiao,Hai Dong,Shunhui Ji,Pengcheng Zhang
2023-08-22
Abstract:The widespread adoption of DNNs in NLP software has highlighted the need for robustness. Researchers proposed various automatic testing techniques for adversarial test cases. However, existing methods suffer from two limitations: weak error-discovering capabilities, with success rates ranging from 0% to 24.6% for BERT-based NLP software, and time inefficiency, taking 177.8s to 205.28s per test case, making them challenging for time-constrained scenarios. To address these issues, this paper proposes LEAP, an automated test method that uses LEvy flight-based Adaptive Particle swarm optimization integrated with textual features to generate adversarial test cases. Specifically, we adopt Levy flight for population initialization to increase the diversity of generated test cases. We also design an inertial weight adaptive update operator to improve the efficiency of LEAP's global optimization of high-dimensional text examples and a mutation operator based on the greedy strategy to reduce the search time. We conducted a series of experiments to validate LEAP's ability to test NLP software and found that the average success rate of LEAP in generating adversarial test cases is 79.1%, which is 6.1% higher than the next best approach (PSOattack). While ensuring high success rates, LEAP significantly reduces time overhead by up to 147.6s compared to other heuristic-based methods. Additionally, the experimental results demonstrate that LEAP can generate more transferable test cases and significantly enhance the robustness of DNN-based systems.
Software Engineering,Computation and Language
What problem does this paper attempt to address?
The paper mainly addresses the issue of testing deep neural network (DNN) components in natural language processing (NLP) software, particularly how to efficiently generate adversarial test cases to detect errors and vulnerabilities in these systems. The paper points out two main problems with current methods: 1. **Weak error detection capability**: Existing automated testing techniques have a low success rate on BERT-based NLP software, ranging from 0% to 24.6%. 2. **Low time efficiency**: Current methods take a long time to generate a single test case, approximately 177.8 seconds to 205.28 seconds, which is a challenge for time-constrained scenarios. To address the above issues, the paper proposes a new method called LEAP, an automated testing method that uses an adaptive particle swarm optimization algorithm based on Levy flight combined with text features to generate adversarial test cases. Specifically, LEAP improves testing efficiency in the following ways: - Uses Levy flight for population initialization to increase the diversity of generated test cases. - Designs an inertia weight adaptive update operator to improve the efficiency of global optimization for high-dimensional text examples. - Introduces a mutation operator based on a greedy strategy to reduce search time. Experimental results show that LEAP achieves an average success rate of 79.1% in generating adversarial test cases, 6.1% higher than the second-best method. Additionally, LEAP significantly reduces time overhead, saving up to 147.6 seconds compared to other heuristic methods. LEAP can also generate more transferable test cases and significantly enhance the robustness of DNN-based systems. The main contributions of the paper include: - Proposing a new automated testing method, LEAP, which reasonably expands the perturbation range by combining Levy flight and Brownian motion, and plans the search path through the proposed adaptive algorithm and greedy mutation strategy, thereby reducing time overhead and query times. - Conducting extensive comparative experiments between LEAP and state-of-the-art automated testing methods, showing that LEAP generates test cases with higher attack success rates and consumes less time. - Evaluating the effectiveness of adversarial test cases in improving the robustness of DNN-based systems. Experimental results show that adversarial training using test cases generated by LEAP can significantly improve the robustness of most victim models, with an improvement range of 9.5% to 13.2%.