Abstract:Rationales, snippets of extracted text that explain an inference, have emerged as a popular framework for interpretable natural language processing (NLP). Rationale models typically consist of two cooperating modules: a selector and a classifier with the goal of maximizing the mutual information (MMI) between the "selected" text and the document label. Despite their promises, MMI-based methods often pick up on spurious text patterns and result in models with nonsensical behaviors. In this work, we investigate whether counterfactual data augmentation (CDA), without human assistance, can improve the performance of the selector by lowering the mutual information between spurious signals and the document label. Our counterfactuals are produced in an unsupervised fashion using class-dependent generative models. From an information theoretic lens, we derive properties of the unaugmented dataset for which our CDA approach would succeed. The effectiveness of CDA is empirically evaluated by comparing against several baselines including an improved MMI-based rationale schema on two multi aspect datasets. Our results show that CDA produces rationales that better capture the signal of interest.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to improve the selector performance and reduce the selector's attention to non - causal features (spurious features) through counterfactual data augmentation (CDA) in the interpretive models in natural language processing (NLP). Specifically, the paper proposes a CDA method that does not rely on human intervention, aiming to reduce the mutual information between non - causal features and document labels, so that the model can more accurately capture the signals of interest. This helps to improve the performance of interpretive models based on the maximum mutual information (MMI) criterion and avoid these models from generating unreasonable behaviors due to paying attention to false patterns in the data set. ### Background and Motivation of the Paper In the research on the interpretability of neural models, providing explanatory text fragments (rationales) has become a popular method, especially in the field of natural language processing. Interpretive models usually consist of two modules that work together: a selector and a classifier. The selector selects the explanatory text from the source document, and the classifier classifies only according to the selected explanatory text without considering the rest of the document. This design aims to improve the interpretability of the model through sparsity and exclusivity. However, the existing interpretive models trained based on the maximum mutual information (MMI) criterion are often vulnerable to false patterns in the data set, resulting in the text selected by the selector failing to effectively reflect the true relationship between the input text and the target label. For example, the model may wrongly predict that the hotel is very clean because of the convenience of location. Such unreasonable explanations not only reduce the credibility of the model but also may indicate that the model has poor generalization ability. ### Proposed Method To solve the above problems, the paper proposes a general counterfactual data augmentation (CDA) method. The core idea of this method is to reduce the mutual information between non - causal features and the target label by generating counterfactual documents. The specific steps are as follows: 1. **Generate counterfactual documents**: For each original document, use a class - conditioned generative model to generate a new document, in which the target label is flipped and the explanatory text in the original document is replaced with new inference results. This process can be represented by the following formula: \[ Y^c_1 \leftarrow 1 - Y_1; \quad X^c_1 \leftarrow \arg \max_{X_1} p(X_1 | 1 - Y_1, X_2) \] where \( Y^c_1 \) is the flipped label, \( X^c_1 \) is the generated new explanatory text, and \( X_2 \) is the non - causal feature in the original document. 2. **Construct an augmented data set**: Combine the generated counterfactual documents with the original documents to form a new augmented data set. In the augmented data set, the mutual information between non - causal features and the target label is reduced, which helps the selector more accurately capture the signals of interest. ### Experimental Verification The paper verifies the effectiveness of the CDA method by conducting experiments on two multi - aspect data sets. These two data sets are TripAdvisor and RateBeer, which are used to evaluate hotel locations and the appearance, smell, and taste of beer respectively. The experimental results show that the explanatory text generated by the CDA method can capture the signals of interest more effectively than other baseline methods (such as the improved MMI scheme), and does not require human intervention. ### Conclusion The CDA method proposed in the paper can effectively improve the selector performance of interpretive models and reduce the attention to non - causal features without relying on human intervention. This not only improves the interpretability of the model but also enhances the generalization ability and credibility of the model.

Making a (Counterfactual) Difference One Rationale at a Time

What to Learn, and How: Toward Effective Learning from Rationales

Exploring Distantly-Labeled Rationales in Neural Network Models

A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution

Is the MMI Criterion Necessary for Interpretability? Degenerating Non-causal Features to Plain Noise for Self-Rationalization

Explaining The Efficacy of Counterfactually Augmented Data

Rationalizing Predictions by Adversarial Information Calibration

DARE: Disentanglement-Augmented Rationale Extraction

D-Separation for Causal Self-Explanation

Improving Classifier Robustness through Active Generation of Pairwise Counterfactuals

Data-Centric Human Preference Optimization with Rationales

Counterfactual Collaborative Reasoning

How Ambiguous are the Rationales for Natural Language Reasoning? A Simple Approach to Handling Rationale Uncertainty

Answering Unseen Questions With Smaller Language Models Using Rationale Generation and Dense Retrieval

Beyond Accuracy: Ensuring Correct Predictions With Correct Rationales

Make Your Decision Convincing! A Unified Two-Stage Framework: Self-Attribution and Decision-Making

An information bottleneck approach for controlling conciseness in rationale extraction

Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs

Relation-based Counterfactual Data Augmentation and Contrastive Learning for Robustifying Natural Language Inference Models

Enhancing the Rationale-Input Alignment for Self-explaining Rationalization

Persuasiveness of Generated Free-Text Rationales in Subjective Decisions: A Case Study on Pairwise Argument Ranking