Abstract:Sentiment and emotion detection from textual communication records of developers have various application scenarios in software engineering (SE). However, commonly used off-the-shelf sentiment/emotion detection tools cannot obtain reliable results in SE tasks and misunderstanding of technical knowledge is demonstrated to be the main reason. Then researchers start to create labeled SE-related datasets manually and customize SE-specific methods. However, the scarce labeled data can cover only very limited lexicon and expressions. In this article, we employ emojis as an instrument to address this problem. Different from manual labels that are provided by annotators, emojis are self-reported labels provided by the authors themselves to intentionally convey affective states and thus are suitable indications of sentiment and emotion in texts. Since emojis have been widely adopted in online communication, a large amount of emoji-labeled texts can be easily accessed to help tackle the scarcity of the manually labeled data. Specifically, we leverage Tweets and GitHub posts containing emojis to learn representations of SE-related texts through emoji prediction. By predicting emojis containing in each text, texts that tend to surround the same emoji are represented with similar vectors, which transfers the sentiment knowledge contained in emoji usage to the representations of texts. Then we leverage the sentiment-aware representations as well as manually labeled data to learn the final sentiment/emotion classifier via transfer learning. Compared to existing approaches, our approach can achieve significant improvement on representative benchmark datasets, with an average increase of 0.036 and 0.049 in macro-F1 in sentiment and emotion detection, respectively. Further investigations reveal that the large-scale Tweets make a key contribution to the power of our approach. This finding informs future research not to unilaterally pursue the domain-specific resource but try to transform knowledge from the open domain through ubiquitous signals such as emojis. Finally, we present the open challenges of sentiment and emotion detection in SE through a qualitative analysis of texts misclassified by our approach.

Sentiment-Bearing New Words Mining: Exploiting Emoticons and Latent Polarities

Reflections on Sentiment/Opinion Analysis

A Bootstrapping Method for Extracting Sentiment Words Using Degree Adverb Patterns

Every Term Has Sentiment: Learning from Emoticon Evidences for Chinese Microblog Sentiment Analysis

Microblog Sentiment Analysis with Emoticon Space Model

Predicting the semantic orientation of emoticons

SEntiMoji: An Emoji-Powered Learning Approach for Sentiment Analysis in Software Engineering

Sentiment Analysis of Chinese Micro-blogs Based on Emoticons and Emotional Words

Sentiment Expression via Emoticons on Social Media

Emotion Mining Research on Micro-Blog

New Word Detection For Sentiment Analysis

Emotion tokens: bridging the gap among multilingual twitter sentiment analysis

Extraction and polarity determination for opinion expression

Towards Building a High-Quality Microblog-Specific Chinese Sentiment Lexicon.

Emoji-powered Sentiment and Emotion Detection from Software Developers’ Communication Data

A Novel Sentiment Polarity Detection Framework for Chinese

Word Dictionary Emoticon Dictionary SentiBank : ANP Detector Library Microblog with labeled sentiment Testing microblogs Update Update W Update g

An Approach of Text Sentiment Analysis for Public Opinion Monitoring System.

Creating emoji lexica from unsupervised sentiment analysis of their descriptions

Lexicon-Based Sentiment Analysis on Topical Chinese Microblog Messages.

Microblog Sentiment Classification with Heterogeneous Sentiment Knowledge