ProPC: A Dataset for In-Domain and Cross-Domain Proposition Classification Tasks

Mengyang Hu,Pengyuan Liu,Boqiang Lin,Yuting Mao,Ke Xu,Wentao Su
DOI: https://doi.org/10.1007/978-3-030-88480-2_5
2021-01-01
Abstract:Correctly identifying the types of propositions helps to understand the logical relationship between sentences, and is of great significance to natural language understanding, reasoning and generation. However, in previous studies: 1) Only explicit propositions are concerned, while most propositions in texts are implicit; 2) Only detect whether it is a proposition, but it is more meaningful to identify which proposition type it belongs to; 3) Only in the encyclopedia domain, whereas propositions exist widely in various domains. We present ProPC, a dataset for in-domain and cross-domain propositions classification. It consists of 15,000 sentences, 4 different classifications, in 5 different domains. We define two new tasks: 1) In-domain proposition classification, which is to identify the proposition type of a given sentence (not limited to explicit proposition); 2) Cross-domain proposition classification, which takes encyclopedia as the source domain and the other 4 domains as the target domain. We use the Matching, Bert and RoBERTa as our baseline methods and run experiments on each task. The result shows that machine indeed can learn the characteristics of various types of propositions from explicit propositions and classify implicit propositions, but the ability of domain generalization still needs to be strengthened. Our dataset, ProPC, is publicly available at https://github.com/NLUSoCo/ProPC.
What problem does this paper attempt to address?