PERLEX: A Bilingual Persian-English Gold Dataset for Relation Extraction

Majid Asgari-Bidhendi,Mehrdad Nasser,Behrooz Janfada,Behrouz Minaei-Bidgoli
DOI: https://doi.org/10.1155/2021/8893270
2021-03-16
Scientific Programming
Abstract:Relation extraction is the task of extracting semantic relations between entities in a sentence. It is an essential part of some natural language processing tasks such as information extraction, knowledge extraction, question answering, and knowledge base population. The main motivations of this research stem from a lack of a dataset for relation extraction in the Persian language as well as the necessity of extracting knowledge from the growing big data in the Persian language for different applications. In this paper, we present “PERLEX” as the first Persian dataset for relation extraction, which is an expert-translated version of the “SemEval-2010-Task-8” dataset. Moreover, this paper addresses Persian relation extraction utilizing state-of-the-art language-agnostic algorithms. We employ six different models for relation extraction on the proposed bilingual dataset, including a non-neural model (as the baseline), three neural models, and two deep learning models fed by multilingual BERT contextual word representations. The experiments result in the maximum F1-score of 77.66% (provided by BERTEM-MTB method) as the state of the art of relation extraction in the Persian language.
computer science, software engineering
What problem does this paper attempt to address?