Part-of-Speech Tagging for Code Mixed English-Telugu Social Media Data

Kovida Nelakuditi,Divya Sai Jitta,Radhika Mamidi
DOI: https://doi.org/10.1007/978-3-319-75477-2_23
2018-01-01
Abstract:Part-of-Speech Tagging is a primary and an important step for many Natural Language Processing Applications. POS taggers have reported high accuracies on grammatically correct monolingual data. This paper reports work on annotating code mixed English-Telugu data collected from social media site Facebook and creating automatic POS Taggers for this corpus. POS tagging is considered as a classification problem and we use different classifiers like Linear SVMs, CRFs, Multinomial Bayes with different combinations of features which capture both context of the word and its internal structure. We also report our work on experimenting with combining monolingual POS taggers for POS tagging of this code mixed English-Telugu data.
What problem does this paper attempt to address?