Mitigating Gender Bias in Code Large Language Models via Model Editing

Zhanyue Qin,Haochuan Wang,Zecheng Wang,Deyuan Liu,Cunhang Fan,Zhao Lv,Zhiying Tu,Dianhui Chu,Dianbo Sui
2024-10-10
Abstract:In recent years, with the maturation of large language model (LLM) technology and the emergence of high-quality programming code datasets, researchers have become increasingly confident in addressing the challenges of program synthesis automatically. However, since most of the training samples for LLMs are unscreened, it is inevitable that LLMs' performance may not align with real-world scenarios, leading to the presence of social bias. To evaluate and quantify the gender bias in code LLMs, we propose a dataset named CodeGenBias (Gender Bias in the Code Generation) and an evaluation metric called FB-Score (Factual Bias Score) based on the actual gender distribution of correlative professions. With the help of CodeGenBias and FB-Score, we evaluate and analyze the gender bias in eight mainstream Code LLMs. Previous work has demonstrated that model editing methods that perform well in knowledge editing have the potential to mitigate social bias in LLMs. Therefore, we develop a model editing approach named MG-Editing (Multi-Granularity model Editing), which includes the locating and editing phases. Our model editing method MG-Editing can be applied at five different levels of model parameter granularity: full parameters level, layer level, module level, row level, and neuron level. Extensive experiments not only demonstrate that our MG-Editing can effectively mitigate the gender bias in code LLMs while maintaining their general code generation capabilities, but also showcase its excellent generalization. At the same time, the experimental results show that, considering both the gender bias of the model and its general code generation capability, MG-Editing is most effective when applied at the row and neuron levels of granularity.
Software Engineering,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the issue of gender bias in code generation large language models (Code LLMs). Specifically, due to the majority of training samples being unscreened, these models may exhibit social biases, particularly gender bias, when generating code. This bias can not only result in discriminatory code but also have profound social impacts on affected groups. Therefore, the paper proposes a new method to evaluate and mitigate gender bias in these models. ### Main Contributions 1. **Dataset Construction**: The paper constructs a dataset named "CodeGenBias" to detect gender bias in code generation models. This dataset includes 555 training samples, 277 development samples, and 3328 test samples. 2. **Proposed Evaluation Metric**: The paper introduces a new evaluation metric called "FB-Score" (Factual Bias Score) to more accurately reflect the consistency of model outputs with real-world situations, rather than pursuing absolute fairness. 3. **Multi-Granularity Model Editing Method**: The paper proposes a multi-granularity model editing method named "MG-Editing," which can identify and adjust parameters related to gender bias at five different levels of parameter granularity, thereby effectively mitigating gender bias while maintaining the model's code generation capabilities. ### Method Overview 1. **Dataset Construction**: - The dataset is constructed using 320 different professions and 5 different types of modifiers. - The template includes two fully completed examples and one example that needs to be filled with specific professions and modifiers. 2. **Evaluation Metric**: - FB-Score evaluates gender bias by comparing the gender tendencies generated by the model with the gender distribution of related professions in the real world. 3. **Multi-Granularity Model Editing Method**: - **Localization Phase**: Identifying parameters related to gender bias at five different levels of granularity (full parameters, layer, module, row, neuron). - **Editing Phase**: Fine-tuning these key parameters to make the model's gender tendencies more consistent with the real-world gender distribution. ### Experimental Results - **Benchmarking**: The paper conducts large-scale evaluation experiments on current mainstream code generation LLMs, showing that CodeGemma-2B has the lowest gender bias, while CodeGen-350M-mono performs the worst. - **Limitations of Existing Model Editing Methods**: Classic model editing methods (such as ROME) perform poorly in mitigating gender bias and may even significantly increase the model's FB-Score. - **Effectiveness of MG-Editing**: Experimental results demonstrate that the MG-Editing method can effectively mitigate gender bias in code generation models while maintaining their code generation capabilities. ### Conclusion By constructing a dataset, proposing a new evaluation metric, and developing a multi-granularity model editing method, the paper systematically addresses the issue of gender bias in code generation models. These methods not only effectively reduce gender bias but also show good generalization capabilities.