A Survey of Cross-lingual Sentiment Analysis: Methodologies, Models and Evaluations

Xu Yuemei,Cao Han,Du Wanze,Wang Wenqing
DOI: https://doi.org/10.1007/s41019-022-00187-3
2022-01-01
Data Science and Engineering
Abstract:Cross-lingual sentiment analysis (CLSA) leverages one or several source languages to help the low-resource languages to perform sentiment analysis. Therefore, the problem of lack of annotated corpora in many non-English languages can be alleviated. Along with the development of economic globalization, CLSA has attracted much attention in the field of sentiment analysis and the last decade has seen a surge of researches in this area. Numerous methods, datasets and evaluation metrics have been proposed in the literature, raising the need for a comprehensive and updated survey. This paper fills the gap by reviewing the state-of-the-art CLSA approaches from 2004 to the present. This paper teases out the research context of cross-lingual sentiment analysis and elaborates the following methods in detail: (1) The early main methods of CLSA, including those based on Machine Translation and its improved variants, parallel corpora or bilingual sentiment lexicon; (2) CLSA based on cross-lingual word embedding; (3) CLSA based on multi-BERT and other pre-trained models. We further analyze their main ideas, methodologies, shortcomings, etc., and attempt to reach a conclusion on the coverage of languages, datasets and their performance. Finally, we look into the future development of CLSA and the challenges facing the research area.
What problem does this paper attempt to address?