Testing The Reasoning Power For Nli Models With Annotated Multi-Perspective Entailment Dataset

Dong Yu,Lu Liu,Chen Yu,Changliang Li
DOI: https://doi.org/10.1007/978-3-030-32381-3_2
2019-01-01
Abstract:Natural language inference (NLI) is a challenging task to determine the relationship between a pair of sentences. Existing Neural Network-based (NN-based) models have achieved prominent success. However, rare models are interpretable. In this paper, we propose a Multi-perspective Entailment Category Labeling System (METALs). It consists of three categories, ten sub-categories. We manually annotate 3,368 entailment items. The annotated data is used to explain the recognition ability of four NN-based models at a fine-grained level. The experimental results show that all the models have poor performance in the commonsense reasoning than in other entailment categories. The highest accuracy difference is 13.22%.
What problem does this paper attempt to address?