Large Discourse Treebanks from Scalable Distant Supervision

Patrick Huber,Giuseppe Carenini
DOI: https://doi.org/10.48550/arXiv.2212.06038
2022-10-18
Computation and Language
Abstract:Discourse parsing is an essential upstream task in Natural Language Processing with strong implications for many real-world applications. Despite its widely recognized role, most recent discourse parsers (and consequently downstream tasks) still rely on small-scale human-annotated discourse treebanks, trying to infer general-purpose discourse structures from very limited data in a few narrow domains. To overcome this dire situation and allow discourse parsers to be trained on larger, more diverse and domain-independent datasets, we propose a framework to generate "silver-standard" discourse trees from distant supervision on the auxiliary task of sentiment analysis.
What problem does this paper attempt to address?