A Benchmark and Scoring Algorithm for Enriching Arabic Synonyms

Sana Ghanem,Mustafa Jarrar,Radi Jarrar,Ibrahim Bounhas
DOI: https://doi.org/10.48550/arXiv.2302.02232
2023-02-05
Abstract:This paper addresses the task of extending a given synset with additional synonyms taking into account synonymy strength as a fuzzy value. Given a mono/multilingual synset and a threshold (a fuzzy value [0-1]), our goal is to extract new synonyms above this threshold from existing lexicons. We present twofold contributions: an algorithm and a benchmark dataset. The dataset consists of 3K candidate synonyms for 500 synsets. Each candidate synonym is annotated with a fuzzy value by four linguists. The dataset is important for (i) understanding how much linguists (dis/)agree on synonymy, in addition to (ii) using the dataset as a baseline to evaluate our algorithm. Our proposed algorithm extracts synonyms from existing lexicons and computes a fuzzy value for each candidate. Our evaluations show that the algorithm behaves like a linguist and its fuzzy values are close to those proposed by linguists (using RMSE and MAE). The dataset and a demo page are publicly available at <a class="link-external link-https" href="https://portal.sina.birzeit.edu/synonyms" rel="external noopener nofollow">this https URL</a>.
Computation and Language
What problem does this paper attempt to address?