Mevaker: Conclusion Extraction and Allocation Resources for the Hebrew Language

Vitaly Shalumov,Harel Haskey,Yuval Solaz
2024-03-12
Abstract:In this paper, we introduce summarization MevakerSumm and conclusion extraction MevakerConc datasets for the Hebrew language based on the State Comptroller and Ombudsman of Israel reports, along with two auxiliary datasets. We accompany these datasets with models for conclusion extraction (HeConE, HeConEspc) and conclusion allocation (HeCross). All of the code, datasets, and model checkpoints used in this work are publicly available.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper focuses on the extension of natural language processing (NLP) resources for Hebrew, specifically for the tasks of summarization and conclusion extraction. The researchers created two datasets: MevakerSumm for summarization generation and MevakerConc for conclusion extraction. In addition, two auxiliary datasets, MevakerConcSen and MevakerConcTree, were constructed for sentence-level conclusion detection and conclusion allocation training, respectively. In the paper, they trained two different models, HeConE and HeConEspc, for conclusion extraction, as well as a model HeCross for conclusion allocation. All codes, datasets, and models are made publicly available. The research goal is not only to increase NLP resources for Hebrew but also to provide models for less-studied tasks.