TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style

Deniz Zeyrek,Amália Mendes,Yulia Grishina,Murathan Kurfalı,Samuel Gibbon,Maciej Ogrodniczuk
DOI: https://doi.org/10.1007/s10579-019-09445-9
2019-04-06
Language Resources and Evaluation
Abstract:TED-Multilingual Discourse Bank, or TED-MDB, is a multilingual resource where TED-talks are annotated at the discourse level in 6 languages (English, Polish, German, Russian, European Portuguese, and Turkish) following the aims and principles of PDTB. We explain the corpus design criteria, which has three main features: the linguistic characteristics of the languages involved, the interactive nature of TED talks—which led us to annotate Hypophora, and the decision to avoid projection. We report our annotation consistency, and post-annotation alignment experiments, and provide a cross-lingual comparison based on corpus statistics.
computer science, interdisciplinary applications
What problem does this paper attempt to address?