Software Fault Localization Based on Multi-objective Feature Fusion and Deep Learning

Xiaolei Hu,Dongcheng Li,W. Eric Wong,Ya Zou
2024-11-26
Abstract:Software fault localization remains challenging due to limited feature diversity and low precision in traditional methods. This paper proposes a novel approach that integrates multi-objective optimization with deep learning models to improve both accuracy and efficiency in fault localization (FL). By framing feature selection as a multi-objective optimization problem (MOP), we extract and fuse three critical fault-related feature sets: spectrum-based, mutation-based, and text-based features, into a comprehensive feature fusion model. These features are then embedded within a deep learning architecture, comprising a multilayer perceptron (MLP) and gated recurrent network (GRN), which together enhance localization accuracy and generalizability. Experiments on the Defects4J benchmark dataset with 434 faults show that the proposed algorithm reduces processing time by 78.2% compared to single-objective methods. Additionally, our MLP and GRN models achieve a 94.2% improvement in localization accuracy compared to traditional FL methods, outperforming state-of-the-art deep learning-based FL method by 7.67%. Further validation using the PROMISE dataset demonstrates the generalizability of the proposed model, showing a 4.6% accuracy improvement in cross-project tests over state-of-the-art deep learning-based FL method.
Software Engineering
What problem does this paper attempt to address?
This paper attempts to address two major challenges in software fault localization (FL): 1. **Insufficient feature extraction**: Existing feature extraction techniques are unable to fully capture software fault information, resulting in inaccurate fault localization. Traditional methods usually rely on a single type of feature as guidance, which leaves a large amount of useful feature information under - utilized. 2. **Low model accuracy**: Many existing models have the problem of insufficient accuracy in fault localization. Especially when dealing with large - scale actual fault data sets, the processing time increases significantly, affecting the time efficiency of the model. To solve these problems, the paper proposes a new method that combines multi - objective optimization and deep - learning models to improve the accuracy and efficiency of fault localization. Specifically, the main contributions of the paper are as follows: - **Multi - objective feature fusion algorithm**: Select effective features from three dimensions of spectral features, mutation features, and text features, and develop a multi - objective feature fusion algorithm through voting and weighting methods, which solves the problems of static feature loss and feature information redundancy. - **Fault - location model based on deep learning**: Design and implement a fault - location model based on Multilayer Perceptron (MLP) and Gated Recurrent Unit (GRU), which improves the accuracy and generalization ability of fault localization. The experimental results show that the proposed algorithm reduces the processing time on the Defects4J benchmark data set by 78.2% compared with the single - objective method, and improves the fault - location accuracy by 94.2% compared with traditional methods and by 7.67% compared with the state - of - the - art deep - learning methods. In addition, the verification using the PROMISE data set also shows the generalization ability of the model, with an improvement of 4.6% in accuracy in cross - project testing compared with the state - of - the - art deep - learning methods.