CommonsenseVIS: Visualizing and Understanding Commonsense Reasoning Capabilities of Natural Language Models

Xingbo Wang,Renfei Huang,Zhihua Jin,Tianqing Fang,Huamin Qu
DOI: https://doi.org/10.1109/TVCG.2023.3327153
2023-07-24
Abstract:Recently, large pretrained language models have achieved compelling performance on commonsense benchmarks. Nevertheless, it is unclear what commonsense knowledge the models learn and whether they solely exploit spurious patterns. Feature attributions are popular explainability techniques that identify important input concepts for model outputs. However, commonsense knowledge tends to be implicit and rarely explicitly presented in inputs. These methods cannot infer models' implicit reasoning over mentioned concepts. We present CommonsenseVIS, a visual explanatory system that utilizes external commonsense knowledge bases to contextualize model behavior for commonsense question-answering. Specifically, we extract relevant commonsense knowledge in inputs as references to align model behavior with human knowledge. Our system features multi-level visualization and interactive model probing and editing for different concepts and their underlying relations. Through a user study, we show that CommonsenseVIS helps NLP experts conduct a systematic and scalable visual analysis of models' relational reasoning over concepts in different situations.
Computation and Language,Human-Computer Interaction
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: How to understand and interpret the performance of natural language processing (NLP) models in common - sense reasoning tasks, especially whether these models truly possess and can effectively use common - sense knowledge for reasoning. Specifically, the paper focuses on the following aspects: 1. **Revealing the common - sense reasoning ability of models**: - Large pre - trained language models (such as BERT, GPT, and T5) perform well in common - sense benchmark tests, but their internal mechanisms lack transparency, and it is difficult to understand what common - sense knowledge these models have actually learned. - Although existing feature attribution methods (such as LIME and SHAP) can explain the importance of input features to model outputs, they cannot provide effective explanations for implicit common - sense relationships. 2. **Systematic and scalable analysis of model behavior**: - Since the common - sense knowledge space is complex and vast, and there are multiple relationships and backgrounds among concepts, existing explanation methods are difficult to efficiently construct high - level abstractions of model behavior and generalize to large - scale datasets. - A method is needed to systematically analyze the model's understanding and reasoning ability of various concepts and their relationships in different situations. 3. **Diagnosing and editing the model**: - It is not only necessary to understand the model's reasoning process, but also to be able to actively inject and update specific knowledge in the model to improve its performance. - Provide interactive tools to explore and edit the model in order to discover and enhance the model's deficiencies in certain knowledge areas. To solve these problems, the author proposes a visual explanation system named **CommonsenseVIS**. This system combines an external common - sense knowledge base (such as ConceptNet), extracts implicit common - sense knowledge in input data, and uses multi - level visualization and interactive exploration and editing functions to help users understand, diagnose, and improve the common - sense reasoning ability of NLP models. ### Main contributions - **CommonsenseVIS**: A visual analysis system that supports systematic and scalable analysis, especially suitable for common - sense tasks involving a large number of concepts and their relationships. It helps align model behavior with human reasoning through model contextualization, multi - level visualization, and interactive exploration and editing functions. - **User study**: Demonstrates the effectiveness and usability of this system in revealing, diagnosing, and editing the underlying common - sense knowledge that language models have not learned. Through these contributions, the paper aims to improve the transparency and reliability of NLP models, making them more suitable for practical applications.