Making a (Counterfactual) Difference One Rationale at a Time

Mitchell Plyler,Michael Green,Min Chi
DOI: https://doi.org/10.48550/arXiv.2201.05177
2022-01-14
Abstract:Rationales, snippets of extracted text that explain an inference, have emerged as a popular framework for interpretable natural language processing (NLP). Rationale models typically consist of two cooperating modules: a selector and a classifier with the goal of maximizing the mutual information (MMI) between the "selected" text and the document label. Despite their promises, MMI-based methods often pick up on spurious text patterns and result in models with nonsensical behaviors. In this work, we investigate whether counterfactual data augmentation (CDA), without human assistance, can improve the performance of the selector by lowering the mutual information between spurious signals and the document label. Our counterfactuals are produced in an unsupervised fashion using class-dependent generative models. From an information theoretic lens, we derive properties of the unaugmented dataset for which our CDA approach would succeed. The effectiveness of CDA is empirically evaluated by comparing against several baselines including an improved MMI-based rationale schema on two multi aspect datasets. Our results show that CDA produces rationales that better capture the signal of interest.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the selector performance and reduce the selector's attention to non - causal features (spurious features) through counterfactual data augmentation (CDA) in the interpretive models in natural language processing (NLP). Specifically, the paper proposes a CDA method that does not rely on human intervention, aiming to reduce the mutual information between non - causal features and document labels, so that the model can more accurately capture the signals of interest. This helps to improve the performance of interpretive models based on the maximum mutual information (MMI) criterion and avoid these models from generating unreasonable behaviors due to paying attention to false patterns in the data set. ### Background and Motivation of the Paper In the research on the interpretability of neural models, providing explanatory text fragments (rationales) has become a popular method, especially in the field of natural language processing. Interpretive models usually consist of two modules that work together: a selector and a classifier. The selector selects the explanatory text from the source document, and the classifier classifies only according to the selected explanatory text without considering the rest of the document. This design aims to improve the interpretability of the model through sparsity and exclusivity. However, the existing interpretive models trained based on the maximum mutual information (MMI) criterion are often vulnerable to false patterns in the data set, resulting in the text selected by the selector failing to effectively reflect the true relationship between the input text and the target label. For example, the model may wrongly predict that the hotel is very clean because of the convenience of location. Such unreasonable explanations not only reduce the credibility of the model but also may indicate that the model has poor generalization ability. ### Proposed Method To solve the above problems, the paper proposes a general counterfactual data augmentation (CDA) method. The core idea of this method is to reduce the mutual information between non - causal features and the target label by generating counterfactual documents. The specific steps are as follows: 1. **Generate counterfactual documents**: For each original document, use a class - conditioned generative model to generate a new document, in which the target label is flipped and the explanatory text in the original document is replaced with new inference results. This process can be represented by the following formula: \[ Y^c_1 \leftarrow 1 - Y_1; \quad X^c_1 \leftarrow \arg \max_{X_1} p(X_1 | 1 - Y_1, X_2) \] where \( Y^c_1 \) is the flipped label, \( X^c_1 \) is the generated new explanatory text, and \( X_2 \) is the non - causal feature in the original document. 2. **Construct an augmented data set**: Combine the generated counterfactual documents with the original documents to form a new augmented data set. In the augmented data set, the mutual information between non - causal features and the target label is reduced, which helps the selector more accurately capture the signals of interest. ### Experimental Verification The paper verifies the effectiveness of the CDA method by conducting experiments on two multi - aspect data sets. These two data sets are TripAdvisor and RateBeer, which are used to evaluate hotel locations and the appearance, smell, and taste of beer respectively. The experimental results show that the explanatory text generated by the CDA method can capture the signals of interest more effectively than other baseline methods (such as the improved MMI scheme), and does not require human intervention. ### Conclusion The CDA method proposed in the paper can effectively improve the selector performance of interpretive models and reduce the attention to non - causal features without relying on human intervention. This not only improves the interpretability of the model but also enhances the generalization ability and credibility of the model.