Abstract:Sentiment and emotion detection from textual communication records of developers have various application scenarios in software engineering (SE). However, commonly used off-the-shelf sentiment/emotion detection tools cannot obtain reliable results in SE tasks and misunderstanding of technical knowledge is demonstrated to be the main reason. Then researchers start to create labeled SE-related datasets manually and customize SE-specific methods. However, the scarce labeled data can cover only very limited lexicon and expressions. In this article, we employ emojis as an instrument to address this problem. Different from manual labels that are provided by annotators, emojis are self-reported labels provided by the authors themselves to intentionally convey affective states and thus are suitable indications of sentiment and emotion in texts. Since emojis have been widely adopted in online communication, a large amount of emoji-labeled texts can be easily accessed to help tackle the scarcity of the manually labeled data. Specifically, we leverage Tweets and GitHub posts containing emojis to learn representations of SE-related texts through emoji prediction. By predicting emojis containing in each text, texts that tend to surround the same emoji are represented with similar vectors, which transfers the sentiment knowledge contained in emoji usage to the representations of texts. Then we leverage the sentiment-aware representations as well as manually labeled data to learn the final sentiment/emotion classifier via transfer learning. Compared to existing approaches, our approach can achieve significant improvement on representative benchmark datasets, with an average increase of 0.036 and 0.049 in macro-F1 in sentiment and emotion detection, respectively. Further investigations reveal that the large-scale Tweets make a key contribution to the power of our approach. This finding informs future research not to unilaterally pursue the domain-specific resource but try to transform knowledge from the open domain through ubiquitous signals such as emojis. Finally, we present the open challenges of sentiment and emotion detection in SE through a qualitative analysis of texts misclassified by our approach.

The more "similar" the happier: Augmenting text using similarity scoring with neural embeddings for happiness classification

Improving Human Happiness Analysis Based on Transfer Learning: Algorithm Development and Validation

Leveraging distant supervision and deep learning for twitter sentiment and emotion classification

Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM

Joint Emoji Classification And Embedding Learning

Data Augmentation for Emotion Detection in Small Imbalanced Text Data

A Feature Fusion Model with Data Augmentation for Speech Emotion Recognition

Word Embedding Composition for Data Imbalances in Sentiment and Emotion Classification

An investigation into the deep learning approach in sentimental analysis using graph-based theories

PlusEmo2Vec at SemEval-2018 Task 1: Exploiting emotion knowledge from emoji and #hashtags

Fine-Tuning BERT Based Approach for Multi-Class Sentiment Analysis on Twitter Emotion Data

Emoji-powered Sentiment and Emotion Detection from Software Developers’ Communication Data

RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis With Transformer and Recurrent Neural Network

Sentiment Lexicon Enhanced Neural Sentiment Classification

Improving the Generalizability of Text-Based Emotion Detection by Leveraging Transformers with Psycholinguistic Features

Emotion recognition of social media users based on deep learning

Modelling sentiments based on objectivity and subjectivity with self-attention mechanisms

Emoticon Smoothed Language Models for Twitter Sentiment Analysis

Improving the Explainability of Neural Sentiment Classifiers via Data Augmentation

EMFSA: Emoji-based multifeature fusion sentiment analysis

A `Sourceful' Twist: Emoji Prediction Based on Sentiment, Hashtags and Application Source