Keeping in Time: Adding Temporal Context to Sentiment Analysis Models

Dean Ninalga
2023-09-24
Abstract:This paper presents a state-of-the-art solution to the LongEval CLEF 2023 Lab Task 2: LongEval-Classification. The goal of this task is to improve and preserve the performance of sentiment analysis models across shorter and longer time periods. Our framework feeds date-prefixed textual inputs to a pre-trained language model, where the timestamp is included in the text. We show date-prefixed samples better conditions model outputs on the temporal context of the respective texts. Moreover, we further boost performance by performing self-labeling on unlabeled data to train a student model. We augment the self-labeling process using a novel augmentation strategy leveraging the date-prefixed formatting of our samples. We demonstrate concrete performance gains on the LongEval-Classification evaluation set over non-augmented self-labeling. Our framework achieves a 2nd place ranking with an overall score of 0.6923 and reports the best Relative Performance Drop (RPD) of -0.0656 over the short evaluation set.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the issue of maintaining the performance of sentiment analysis models over different time spans. Specifically, the researchers propose a method to improve and sustain the performance of sentiment analysis models on both short-term and long-term data by adding timestamps before the text input. Traditional sentiment analysis models lack time-awareness, which leads to misjudgments when processing texts that change over time. To solve this problem, the researchers adopted a novel approach by adding date prefixes to the input text and using this method to condition pre-trained language models, enabling them to adjust their output based on the temporal context of the text. Additionally, the researchers introduced an enhancement strategy by self-labeling unannotated data and randomly modifying timestamps during the labeling process to further improve model performance. Experimental results show that this method, which combines date prefixes with a self-labeling enhancement strategy, achieves significant performance improvements on both short-term and long-term evaluation sets and performs well on the Relative Performance Drop (RPD) metric.