Multi-round Counterfactual Generation: Interpreting and Improving Models of Text Classification.

Huajie Zhang,Ying Yuxin,Fuzhen Zhuang,Haiqin Weng,Ying Sun,Zhao Zhang,Yiqi Tong,Yan Liu
DOI: https://doi.org/10.1145/3589335.3651537
2024-01-01
Abstract:In recent years, natural language processing (NLP) models have demonstrated remarkable performance in text classification tasks. However, trust in the decision-making process requires a deeper understanding of the operational principles of these networks. Therefore, there is an urgent need to enhance transparency and the interpretability of these "black boxes". Aligned with this, we propose a model-agnostic interpretability method named MCG. This method generates counterfactual interpretations that are more faithful to the original models' performance through a multi-round dialogue, in which a new template is generated based on the evaluation of the previous counterfactual interpretation. In addition, MCG proposes a solution to improve model performance through counterfactual data augmentation for cases where the model to be interpreted is misclassified, which is rarely covered by existing counterfactual methods. Extensive experiments on three datasets demonstrate that our MCG outperforms current state-of-the-art methods in counterfactual generation for interpretability.
What problem does this paper attempt to address?