Abstract:Deep Neural Networks (DNNs) have achieved tremendous success in many applications, while it has been demonstrated that DNNs can exhibit some undesirable behaviors on concerns such as robustness, privacy, and other trustworthiness issues. Among them, fairness ( i.e., , non-discrimination) is one important property especially when they are applied to some sensitive applications ( e.g., , finance and employment). However, DNNs easily learn spurious correlations between protected attributes ( e.g., , age, gender, race) and the classification task and develop discriminatory behaviors if the training data is imbalanced. Such discriminatory decisions in sensitive applications would introduce severe social impacts. To expose potential discrimination problems in DNNs before putting them in use, some testing techniques have been proposed to identify the discriminatory instances ( i.e., , instances that show defined discrimination). However, how to repair DNNs after detecting such discrimination is still challenging. Existing techniques mainly rely on retraining on a large number of discriminatory instances generated by testing methods, which requires huge time overhead and makes the repairing inefficient. In this work, we propose a method Faire to effectively and efficiently repair the fairness issues of DNNs, without using additional data ( e.g., , discriminatory instances). Our basic idea is inspired by the traditional program repair method that synthesizes proper condition checking. To repair traditional programs, a typical method is to localize the program defects and repair the program logic by adding condition checking. Similarly, for DNNs, we try to understand the unfair logic and reformulate it with well-designed condition checking. In this paper, we synthesize the condition that can reduce the effect of features relevant to the protected attributes in the DNN. Specifically, we first perform the neuron-based analysis and check the functionalities of neurons to identify neurons whose outputs could be regarded as features relevant to protected attributes and original tasks. Then a new condition layer is added after each hidden layer to penalize neurons that are accountable for the protected features ( i.e., , intermediate features relevant to protected attributes) and promote neurons that are accountable for the non-protected features ( i.e., , intermediate features relevant to original tasks). In sum, the repair rate of Faire reaches up to more than \(99\% \) , which outperforms other methods, and the whole repairing process only takes no more than 340 seconds. The evaluation results demonstrate that our approach can effectively and efficiently repair the individual discriminatory instances of the target model.

Interpretability Based Neural Network Repair

VeRe: Verification Guided Synthesis for Repairing Deep Neural Networks.

Fooling Neural Network Interpretations - Adversarial Noise to Attack Images.

Repairing Adversarial Texts Through Perturbation

Causality-based Neural Network Repair

Repairing Deep Neural Networks Based on Behavior Imitation

Neural Network Repair with Reachability Analysis

NNrepair: Constraint-based Repair of Neural Network Classifiers

Isolation-Based Debugging for Neural Networks

Architecture-Preserving Provable Repair of Deep Neural Networks

Fidelity of Interpretability Methods and Perturbation Artifacts in Neural Networks

Faire: Repairing Fairness of Neural Networks via Neuron Condition Synthesis

PatchNAS: Repairing DNNs in Deployment with Patched Network Architecture Search.

Sound and Complete Neural Network Repair with Minimality and Locality Guarantees

SpecRepair: Counter-Example Guided Safety Repair of Deep Neural Networks

ArchRepair : Block-Level Architecture-Oriented Repairing for Deep Neural Networks

Interpret Neural Networks by Identifying Critical Data Routing Paths.

Neural Program Repair with Program Dependence Analysis and Effective Filter Mechanism

Towards Repairing Neural Networks Correctly

Reconstructive Neuron Pruning for Backdoor Defense

Repairing Deep Neural Networks: Fix Patterns and Challenges