Leveraging Static Analysis for Bug Repair

Ruba Mutasim,Gabriel Synnaeve,David Pichardie,Baptiste Rozière
2023-04-21
Abstract:We propose a method combining machine learning with a static analysis tool (i.e. Infer) to automatically repair source code. Machine Learning methods perform well for producing idiomatic source code. However, their output is sometimes difficult to trust as language models can output incorrect code with high confidence. Static analysis tools are trustable, but also less flexible and produce non-idiomatic code. In this paper, we propose to fix resource leak bugs in IR space, and to use a sequence-to-sequence model to propose fix in source code space. We also study several decoding strategies, and use Infer to filter the output of the model. On a dataset of CodeNet submissions with potential resource leak bugs, our method is able to find a function with the same semantics that does not raise a warning with around 97% precision and 66% recall.
Software Engineering
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the issue of vulnerability detection and repair in source code. Specifically, it proposes a method that combines machine learning with static analysis tools (such as Infer) to automatically fix resource leak vulnerabilities in source code. #### Main Contributions: 1. **Generating Parallel Dataset**: The paper generates a parallel dataset containing Java code and Infer IR, and trains a model to decompile Infer IR back to source code. 2. **Automatic Repair of Resource Leak Vulnerabilities**: It proposes a method to reliably automatically fix resource leak vulnerabilities in Infer IR. 3. **Combining Automatic Repair Tools and Decompiler**: It combines the new automatic repair tool with a decompiler to fix resource leak vulnerabilities and restore the repaired Java source code. Experimental results show that this method can fix over 66% of Infer warnings in the dataset, and 96.9% of the submissions passed unit tests. ### Research Background and Challenges Current models struggle to capture the reasoning capabilities within the structure of source code without a sufficiently large high-quality dataset. Additionally, machine learning models may produce results that are difficult to trust, while static analysis tools, although reliable, are not flexible enough and struggle to propose fixes that align with programming practices. ### Method Overview 1. **Dataset**: Experiments are conducted using the GitHub BigQuery and CodeNet datasets. 2. **Model Architecture**: A sequence-to-sequence (seq2seq) Transformer model is used, consisting of 6 layers each for the encoder and decoder, with approximately 312 million parameters. 3. **Pre-training**: The model is pre-trained using a denoising autoencoder task to improve its ability to understand IR and generate Java source code. 4. **Automatic Repair**: Vulnerabilities are automatically repaired at the IR level, and a decompiler is used to convert the repaired IR back to source code. 5. **Evaluation Metrics**: These include edit distance, compilation accuracy, IR matching degree, and unit test accuracy. ### Conclusion and Future Work The proposed method can suggest appropriate fixes in 66.4% of code segments, with an accuracy ranging from 96.4% to 97.9%. Future work plans include extending to other types of vulnerabilities, developing automated methods to compare the semantic equivalence of IR, and proposing semantic distances between programs.