Time-Series Contrastive Learning against False Negatives and Class Imbalance

Xiyuan Jin,Jing Wang,Lei Liu,Youfang Lin
2024-08-24
Abstract:As an exemplary self-supervised approach for representation learning, time-series contrastive learning has exhibited remarkable advancements in contemporary research. While recent contrastive learning strategies have focused on how to construct appropriate positives and negatives, in this study, we conduct theoretical analysis and find they have overlooked the fundamental issues: false negatives and class imbalance inherent in the InfoNCE loss-based framework. Therefore, we introduce a straightforward modification grounded in the SimCLR framework, universally adaptable to models engaged in the instance discrimination task. By constructing instance graphs to facilitate interactive learning among instances, we emulate supervised contrastive learning via the multiple-instances discrimination task, mitigating the harmful impact of false negatives. Moreover, leveraging the graph structure and few-labeled data, we perform semi-supervised consistency classification and enhance the representative ability of minority classes. We compared our method with the most popular time-series contrastive learning methods on four real-world time-series datasets and demonstrated our significant advantages in overall performance.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address two core issues in time series contrastive learning: the impact of false negative samples and the class imbalance problem. 1. **Impact of False Negative Samples**: In instance discrimination tasks, unsupervised contrastive learning (UCL) typically assumes that manually designed data augmentations or other views/modalities serve as positive samples, while the remaining samples in the batch are considered negative samples, regardless of their semantic content. Although this simple and efficient method has facilitated the widespread application of contrastive learning, it lacks guarantees for the authenticity and validity of negative samples. In the absence of supervision, negative sample pairs are likely to contain semantically similar or identical samples, i.e., false negative samples. The presence of false negative samples can severely hinder the convergence of feature representation learning. 2. **Class Imbalance Problem**: Existing unsupervised contrastive learning methods usually do not consider the distribution of training data. However, time series data collected in real-world environments, especially physiological time series data, often exhibit imbalanced distributions. For example, the duration of disease occurrences is usually much shorter than non-disease periods, or the occurrence frequency of certain rare diseases is much lower than that of common diseases. Therefore, the availability of time series data for specific classes is inherently limited. Most unsupervised contrastive learning methods perform poorly for critical but less frequent minority classes, necessitating a general representation learning method to improve the representation of these minority classes. The paper proposes a semi-supervised instance graph framework based on pseudo-label distribution learning (SIP-LDL) to mitigate the impact of false negative samples by constructing multi-instance discrimination tasks and enhancing the relationships between instances using graph convolution, thereby improving the classification performance of minority classes. A series of experiments validate the effectiveness of this method, achieving significant improvements on multiple physiological time series datasets, particularly in the performance of minority classes.