Abstract:We study the problem of interpreting trained classification models in the setting of linguistic data sets. Leveraging a parse tree, we propose to assign least-squares based importance scores to each word of an instance by exploiting syntactic constituency structure. We establish an axiomatic characterization of these importance scores by relating them to the Banzhaf value in coalitional game theory. Based on these importance scores, we develop a principled method for detecting and quantifying interactions between words in a sentence. We demonstrate that the proposed method can aid in interpretability and diagnostics for several widely-used language models.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve the interpretability of natural language processing (NLP) models, especially when dealing with language data. Specifically, the author proposes a method based on syntactic structure to explain the trained classification model, by assigning least - squares importance scores to each word in the sentence and using the Banzhaf value in cooperative game theory for theoretical support. ### Main Problems and Solutions 1. **Problem Background**: - Modern machine - learning models are difficult to understand and debug after training, which poses challenges to the trustworthiness, diagnosis, debugging, and robustness of the models. - Current explanation methods either simplify the model to enhance interpretability, resulting in a decline in prediction accuracy; or use black - box methods to explain arbitrary models, but lack prior knowledge for domain - specific explanations. 2. **Research Objectives**: - Propose a new method to explain NLP models, especially how to quantify the interactions between words in a sentence. - By introducing syntactic structures (such as phrase trees), assign importance scores to each word, thereby better understanding the prediction process of the model. 3. **Specific Solutions**: - **LS - Tree Values**: The author proposes a least - squares - based framework, called LS - Tree values, which calculates the importance score of each word by minimizing the squared residuals of each node in the parse tree. These scores are related to the Banzhaf value in cooperative game theory, providing theoretical support. - **Quantify Interactions**: Based on LS - Tree values, the author develops a new method, using Cook's distance to quantify the interactions between sibling nodes (nodes with a common parent) in a sentence. - **Experimental Verification**: Through experiments on multiple datasets, the effectiveness of this method is verified, and the abilities of different models (such as linear models, CNN, LSTM, BERT) in capturing nonlinear and adversarial relationships are analyzed. ### Key Formulas 1. **Least - Squares Problem for LS - Tree Values**: \[ \min_{\psi \in \mathbb{R}^d} \sum_{S \in \wp} [v(S) - \sum_{i \in S} \psi_i]^2 \] where \( v(S) = f(S) - f(\emptyset) \), representing the feature function, and \( \psi_i \) is the importance score of the \( i \)-th word. 2. **Cook's Distance**: \[ D_i = \text{Const.} \cdot (\hat{\beta}(i) - \hat{\beta})^T X^T X (\hat{\beta}(i) - \hat{\beta}) \] where \( \hat{\beta}(i) \) and \( \hat{\beta} \) are the least - squares estimates after deleting the \( i \)-th data point and the original least - squares estimate, respectively. ### Experimental Results - **Nonlinear Analysis**: By comparing the correlations of different models with the linear model, it is found that BERT is the most nonlinear model. - **Adversarial Relationship Capture**: BERT performs best in capturing adversarial relationships (such as "not", "but", etc.), especially on smaller datasets. In general, this paper provides a systematic method to explain NLP models by introducing LS - Tree values and Cook's distance, and demonstrates its effectiveness in multiple tasks.

LS-Tree: Model Interpretation When the Data Are Linguistic

Linguistic Modelling Based on Semantic Similarity Relation among Linguistic Labels

When Are Tree Structures Necessary for Deep Learning of Representations?

Linguistic Structure Induction from Language Models

Assessment of Pre-Trained Models Across Languages and Grammars

Revisiting Structured Sentiment Analysis as Latent Dependency Graph Parsing

Evaluating statistical language models as pragmatic reasoners

Active Use of Latent Constituency Representation in both Humans and Large Language Models

Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling

Explaining Datasets in Words: Statistical Models with Natural Language Parameters

LSOIT: Lexicon and Syntax Enhanced Opinion Induction Tree for Aspect-based Sentiment Analysis

Do Neural Language Models Show Preferences for Syntactic Formalisms?

On the Importance of Word and Sentence Representation Learning in Implicit Discourse Relation Classification

Language Model as Visual Explainer

Assessing Word Importance Using Models Trained for Semantic Tasks

How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases

Interpretability of Language Models via Task Spaces

Gaussian Tree Constraints Applied to Acoustic Linguistic Functional Data

Integrating Linguistic Theory and Neural Language Models

Finding Structure in Language Models

Linguistic Properties Matter for Implicit Discourse Relation Recognition: Combining Semantic Interaction, Topic Continuity and Attribution