Multi-label Masked Language Modeling on Zero-shot Code-switched Sentiment Analysis

Zhi Li,Xing Gao,Ji Zhang,Yin Zhang
DOI: https://doi.org/10.1145/3477495.3531914
2022-01-01
Abstract:In multilingual communities, code-switching is a common phenomenon and code-switched tasks have become a crucial area of research in natural language processing (NLP) applications. Existing approaches mainly focus on supervised learning. However, it is expensive to annotate a sufficient amount of code-switched data. In this paper, we consider zero-shot setting and improve model performance on code-switched tasks via monolingual language datasets, unlabeled code-switched datasets, and semantic dictionaries. Inspired by the mechanism of code-switching itself, we propose multilabel masked language modeling and predict both the masked word and its synonyms in other languages. Experimental results show that compared with baselines, our method can further improve the pretrained multilingual model's performance on code-switched sentiment analysis datasets.
What problem does this paper attempt to address?