Shangwen Lv,Daya Guo,Jingjing Xu,Duyu Tang,Nan Duan,Ming Gong,Linjun Shou,Daxin Jiang,Guihong Cao,Songlin Hu
Abstract:Commonsense question answering aims to answer questions which require background knowledge that is not explicitly expressed in the question. The key challenge is how to obtain evidence from external knowledge and make predictions based on the evidence. Recent works either learn to generate evidence from human-annotated evidence which is expensive to collect, or extract evidence from either structured or unstructured knowledge bases which fails to take advantages of both sources. In this work, we propose to automatically extract evidence from heterogeneous knowledge sources, and answer questions based on the extracted evidence. Specifically, we extract evidence from both structured knowledge base (i.e. ConceptNet) and Wikipedia plain texts. We construct graphs for both sources to obtain the relational structures of evidence. Based on these graphs, we propose a graph-based approach consisting of a graph-based contextual word representation learning module and a graph-based inference module. The first module utilizes graph structural information to re-define the distance between words for learning better contextual word representations. The second module adopts graph convolutional network to encode neighbor information into the representations of nodes, and aggregates evidence with graph attention mechanism for predicting the final answer. Experimental results on CommonsenseQA dataset illustrate that our graph-based approach over both knowledge sources brings improvement over strong baselines. Our approach achieves the state-of-the-art accuracy (75.3%) on the CommonsenseQA leaderboard.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to extract evidence from heterogeneous external knowledge sources in common - sense question answering and perform reasoning based on this evidence to answer questions. Specifically, the paper focuses on how to use structured knowledge bases (such as ConceptNet) and plain - text knowledge (such as sentences in Wikipedia) to enhance the machine's reasoning ability, so as to answer questions that require background knowledge without directly expressing the background knowledge.
### Paper Background and Objectives
Common - sense question answering is a challenging task that requires machines to be able to collect background knowledge and perform reasoning based on this knowledge to answer questions. Most existing research methods usually only take the current data points as input and ignore the important "evidence" extracted from the background knowledge. This has led to limitations in the effectiveness of existing methods when dealing with questions that require extensive background knowledge.
### Main Contributions
1. **Introduction of Graph - Based Method**: The paper proposes a graph - based method for extracting evidence from heterogeneous knowledge sources (structured knowledge bases and unstructured texts) and performing common - sense question answering.
2. **Graph - Based Context Representation Learning Module**: By using graph structure information to re - define the distance between words, better context word representations are learned.
3. **Graph - Based Reasoning Module**: The graph convolutional network (GCN) is used to encode neighbor information into node representations, and the graph attention mechanism is used to aggregate evidence to predict the final answer.
### Method Overview
1. **Knowledge Extraction**:
- **Knowledge Extraction from ConceptNet**: Identify the entities in the questions and options, search for paths and construct graphs.
- **Knowledge Extraction from Wikipedia**: Use the Elastic Search tool to index Wikipedia sentences, extract relevant sentences and construct graphs.
2. **Graph - Based Reasoning**:
- **Graph - Based Context Representation Learning Module**: Use the topological sorting algorithm to re - order the input evidence, making semantically related words closer in distance, thereby obtaining better context word representations.
- **Graph - Based Reasoning Module**: Aggregate graph information through the graph convolutional network (GCN) and the graph attention mechanism to make the final prediction.
### Experimental Results
- **Data Set**: The CommonsenseQA data set contains 12,102 samples, of which 9,741 are in the training set, 1,221 are in the development set, and 1,140 are in the test set.
- **Experimental Setup**: Use the XLNet large - scale pre - training model, and the input format is “<evidence> <sep> question <sep> the answer is <option> <cls>”.
- **Performance**: Achieve an accuracy rate of 79.3% on the development set and 75.3% on the blind test set, which is better than many existing baseline models.
### Ablation Experiments
- **Reasoning Components**: The effectiveness of the topological sorting algorithm and the graph reasoning module is verified through ablation experiments. The results show that the topological sorting algorithm brings a 1.9% improvement, the graph reasoning module brings a 1.4% improvement, and the combination of the two brings a 3.5% improvement.
- **Knowledge Sources**: ConceptNet and Wikipedia are used as knowledge sources respectively. The results show that ConceptNet brings a 6.4% improvement and Wikipedia brings a 4.6% improvement. After combining the two, the total improvement reaches 9.4%.
### Case Analysis
A specific example is used to show how the model uses heterogeneous knowledge sources to answer questions. For example, for the question “What are the animals that have hair and do not lay eggs?” (the answer is “mammals”), the model extracts the information that “mammals are animals” and “mammals have hair” from ConceptNet, and the information that “few mammals lay eggs” from Wikipedia. These pieces of information together support the inference of the correct answer.
### Error Analysis
Fifty error samples are randomly selected for analysis, and it is found that the main reasons for errors include:
- **Insufficient Evidence**: Some questions require more background knowledge to be answered correctly, but the extracted evidence is not sufficient to support reasoning.
- **Reasoning Error**: Even if the correct evidence is extracted, the model may make errors during the reasoning process.
- **Noise Interference**: The extracted evidence may contain noise information, which affects the model's judgment.
In short, this paper effectively uses heterogeneous knowledge by proposing a graph - based method.