RuCoCo: a new Russian corpus with coreference annotation

Vladimir Dobrovolskii,Mariia Michurina,Alexandra Ivoylova
DOI: https://doi.org/10.48550/arXiv.2206.04925
2022-06-10
Abstract:We present a new corpus with coreference annotation, Russian Coreference Corpus (RuCoCo). The goal of RuCoCo is to obtain a large number of annotated texts while maintaining high inter-annotator agreement. RuCoCo contains news texts in Russian, part of which were annotated from scratch, and for the rest the machine-generated annotations were refined by human annotators. The size of our corpus is one million words and around 150,000 mentions. We make the corpus publicly available.
Computation and Language
What problem does this paper attempt to address?