Abstract:Recently, large pretrained language models have achieved compelling performance on commonsense benchmarks. Nevertheless, it is unclear what commonsense knowledge the models learn and whether they solely exploit spurious patterns. Feature attributions are popular explainability techniques that identify important input concepts for model outputs. However, commonsense knowledge tends to be implicit and rarely explicitly presented in inputs. These methods cannot infer models' implicit reasoning over mentioned concepts. We present CommonsenseVIS, a visual explanatory system that utilizes external commonsense knowledge bases to contextualize model behavior for commonsense question-answering. Specifically, we extract relevant commonsense knowledge in inputs as references to align model behavior with human knowledge. Our system features multi-level visualization and interactive model probing and editing for different concepts and their underlying relations. Through a user study, we show that CommonsenseVIS helps NLP experts conduct a systematic and scalable visual analysis of models' relational reasoning over concepts in different situations.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are: How to understand and interpret the performance of natural language processing (NLP) models in common - sense reasoning tasks, especially whether these models truly possess and can effectively use common - sense knowledge for reasoning. Specifically, the paper focuses on the following aspects: 1. **Revealing the common - sense reasoning ability of models**: - Large pre - trained language models (such as BERT, GPT, and T5) perform well in common - sense benchmark tests, but their internal mechanisms lack transparency, and it is difficult to understand what common - sense knowledge these models have actually learned. - Although existing feature attribution methods (such as LIME and SHAP) can explain the importance of input features to model outputs, they cannot provide effective explanations for implicit common - sense relationships. 2. **Systematic and scalable analysis of model behavior**: - Since the common - sense knowledge space is complex and vast, and there are multiple relationships and backgrounds among concepts, existing explanation methods are difficult to efficiently construct high - level abstractions of model behavior and generalize to large - scale datasets. - A method is needed to systematically analyze the model's understanding and reasoning ability of various concepts and their relationships in different situations. 3. **Diagnosing and editing the model**: - It is not only necessary to understand the model's reasoning process, but also to be able to actively inject and update specific knowledge in the model to improve its performance. - Provide interactive tools to explore and edit the model in order to discover and enhance the model's deficiencies in certain knowledge areas. To solve these problems, the author proposes a visual explanation system named **CommonsenseVIS**. This system combines an external common - sense knowledge base (such as ConceptNet), extracts implicit common - sense knowledge in input data, and uses multi - level visualization and interactive exploration and editing functions to help users understand, diagnose, and improve the common - sense reasoning ability of NLP models. ### Main contributions - **CommonsenseVIS**: A visual analysis system that supports systematic and scalable analysis, especially suitable for common - sense tasks involving a large number of concepts and their relationships. It helps align model behavior with human reasoning through model contextualization, multi - level visualization, and interactive exploration and editing functions. - **User study**: Demonstrates the effectiveness and usability of this system in revealing, diagnosing, and editing the underlying common - sense knowledge that language models have not learned. Through these contributions, the paper aims to improve the transparency and reliability of NLP models, making them more suitable for practical applications.

CommonsenseVIS: Visualizing and Understanding Commonsense Reasoning Capabilities of Natural Language Models

Evaluating Commonsense in Pre-trained Language Models

Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for visual commonsense reasoning

Commonsense Knowledge Transfer for Pre-trained Language Models

KnowledgeVIS: Interpreting Language Models by Comparing Fill-in-the-Blank Prompts

ROME: Evaluating Pre-trained Vision-Language Models on Reasoning beyond Visual Common Sense

Visually Grounded Commonsense Knowledge Acquisition

Beyond Language: Learning Commonsense from Images for Reasoning

Joint Answering and Explanation for Visual Commonsense Reasoning

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

VCD: Knowledge Base Guided Visual Commonsense Discovery in Images

UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations

Multimodal Commonsense Knowledge Distillation for Visual Question Answering

Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR

Things not Written in Text: Exploring Spatial Commonsense from Visual Signals

EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning

Natural Language Processing with Commonsense Knowledge: A Survey

Explicit Cross-Modal Representation Learning for Visual Commonsense Reasoning

Multi-Level Knowledge Injecting for Visual Commonsense Reasoning

Unsupervised Deep Structured Semantic Models for Commonsense Reasoning.