Raw Tweets Emoji Tweets Word 2 Vec Sentence Representation Model Word Embeddings Representation Learning : Source Language Representation Learning : Target Language Labeled English Docs Supervised Learning Emoji Tweets Word 2 Vec Word EmbeddingsMachine Translate Classification Model

Zhenpeng Chen,Sheng Shen,Ziniu Hu,Xuan Lu,Qiaozhu Mei,Xuanzhe Liu
2019-01-01
Abstract:Sentiment classi cation typically relies on a large amount of labeled data. In practice, the availability of labels is highly imbalanced among di erent languages, e.g., more English texts are labeled than texts in any other languages, which creates a considerable inequality in the quality of related information services received by users speaking di erent languages. To tackle this problem, crosslingual sentiment classi cation approaches aim to transfer knowledge learned from one language that has abundant labeled examples (i.e., the source language, usually English) to another language with fewer labels (i.e., the target language). The source and the target languages are usually bridged through o -the-shelf machine translation tools. Through such a channel, cross-language sentiment patterns can be successfully learned from English and transferred into the target languages. This approach, however, often fails to capture sentiment knowledge speci c to the target language, and thus compromises the accuracy of the downstream classi cation task. In this paper, we employ emojis, which are widely available in many languages, as a new channel to learn both the cross-language and the language-speci c sentiment patterns. We propose a novel representation learning method that uses emoji prediction as an instrument to learn respective sentiment-aware representations for each language. The learned representations are then integrated to facilitate cross-lingual sentiment classi cation. The proposed method demonstrates state-of-the-art performance on benchmark datasets, which is sustained even when sentiment labels are scarce.
What problem does this paper attempt to address?