Abstract:Contrastive learning (CL) has emerged as a promising approach for representation learning in time series data by embedding similar pairs closely while distancing dissimilar ones. However, existing CL methods often introduce false negative pairs (FNPs) by neglecting inherent characteristics and then randomly selecting distinct segments as dissimilar pairs, leading to erroneous representation learning, reduced model performance, and overall inefficiency. To address these issues, we systematically define and categorize FNPs in time series into semantic false negative pairs and temporal false negative pairs for the first time: the former arising from overlooking similarities in label categories, which correlates with similarities in non-stationarity and the latter from neglecting temporal proximity. Moreover, we introduce StatioCL, a novel CL framework that captures non-stationarity and temporal dependency to mitigate both FNPs and rectify the inaccuracies in learned representations. By interpreting and differentiating non-stationary states, which reflect the correlation between trends or temporal dynamics with underlying data patterns, StatioCL effectively captures the semantic characteristics and eliminates semantic FNPs. Simultaneously, StatioCL establishes fine-grained similarity levels based on temporal dependencies to capture varying temporal proximity between segments and to mitigate temporal FNPs. Evaluated on real-world benchmark time series classification datasets, StatioCL demonstrates a substantial improvement over state-of-the-art CL methods, achieving a 2.9% increase in Recall and a 19.2% reduction in FNPs. Most importantly, StatioCL also shows enhanced data efficiency and robustness against label scarcity.
What problem does this paper attempt to address?
### Problems Addressed by the Paper
The paper aims to address two main issues in time series data contrastive learning (CL):
1. **False Negative Pairs (FNPs)**:
- **Semantic False Negative Pairs**: Due to ignoring the similarity between label categories, time series segments that originally belong to the same category are mistakenly considered different, affecting the model's representation learning effectiveness.
- **Temporal False Negative Pairs**: Due to ignoring temporal proximity, time series segments that are close in time are mistakenly considered different, further affecting model performance.
2. **Limitations of Existing Methods**:
- Existing contrastive learning methods typically construct negative sample pairs by randomly selecting different time series segments, which easily introduces false negative pairs, leading to inaccurate representation learning and decreased model performance.
- These methods often overlook the inherent characteristics of time series data, such as non-stationarity and temporal dependency, thus failing to fully capture the intrinsic similarity and features of the data.
### Solution
To address the above issues, the paper proposes a new contrastive learning framework **StatioCL**, which includes the following two key strategies:
1. **Non-Stationary Contrast**:
- Evaluate the non-stationary state of each time series segment through statistical tests (e.g., Augmented Dickey-Fuller test, ADF) and use it as prior knowledge for negative sample selection.
- By designing a special loss function, representations with different non-stationary characteristics are pushed apart in the latent space, effectively eliminating semantic false negative pairs.
2. **Temporal Contrast**:
- Introduce a weighting mechanism to evaluate the similarity of negative sample pairs based on temporal difference (Δt), using Beta distribution to parameterize the weights and capture temporal dependency.
- By adjusting the parameters (α and β) of the Beta distribution, flexibly adapt to different types of data and application scenarios, optimizing the construction of negative sample pairs.
### Experimental Results
The paper conducts experiments on multiple real-world time series classification benchmark datasets, showing that:
- **Performance Improvement**: StatioCL improves recall by an average of 2.9% and reduces false negative pairs by 19.2%.
- **Data Efficiency**: With only 10% of fine-tuning data, StatioCL's accuracy is on average 3.1% higher than other contrastive learning methods and 5.4% higher than traditional supervised methods.
- **Robustness**: StatioCL demonstrates stronger robustness in situations with scarce labels.
### Conclusion
By effectively reducing the impact of false negative pairs, StatioCL achieves higher accuracy and efficiency in the representation learning of time series data, especially in scenarios with scarce label data. This method provides a new solution for time series classification tasks and is expected to achieve better results in practical applications.