COMMENTATOR: A Code-mixed Multilingual Text Annotation Framework

Rajvee Sheth,Shubh Nisar,Heenaben Prajapati,Himanshu Beniwal,Mayank Singh
2024-08-06
Abstract:As the NLP community increasingly addresses challenges associated with multilingualism, robust annotation tools are essential to handle multilingual datasets efficiently. In this paper, we introduce a code-mixed multilingual text annotation framework, COMMENTATOR, specifically designed for annotating code-mixed text. The tool demonstrates its effectiveness in token-level and sentence-level language annotation tasks for Hinglish text. We perform robust qualitative human-based evaluations to showcase COMMENTATOR led to 5x faster annotations than the best baseline. Our code is publicly available at \url{<a class="link-external link-https" href="https://github.com/lingo-iitgn/commentator" rel="external noopener nofollow">this https URL</a>}. The demonstration video is available at \url{<a class="link-external link-https" href="https://bit.ly/commentator_video" rel="external noopener nofollow">this https URL</a>}.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?