Abstract:Multivariate time-series data in numerous real-world applications (e.g., healthcare and industry) are informative but challenging due to the lack of labels and high dimensionality. Recent studies in self-supervised learning have shown their potential in learning rich representations without relying on labels, yet they fall short in learning disentangled embeddings and addressing issues of inductive bias (e.g., transformation-invariance). To tackle these challenges, we propose TimeDRL, a generic multivariate time-series representation learning framework with disentangled dual-level embeddings. TimeDRL is characterized by three novel features: (i) disentangled derivation of timestamp-level and instance-level embeddings from patched time-series data using a [CLS] token strategy; (ii) utilization of timestamp-predictive and instance-contrastive tasks for disentangled representation learning, with the former optimizing timestamp-level embeddings with predictive loss, and the latter optimizing instance-level embeddings with contrastive loss; and (iii) avoidance of augmentation methods to eliminate inductive biases, such as transformation-invariance from cropping and masking. Comprehensive experiments on 6 time-series forecasting datasets and 5 time-series classification datasets have shown that TimeDRL consistently surpasses existing representation learning approaches, achieving an average improvement of forecasting by 58.02% in MSE and classification by 1.48% in accuracy. Furthermore, extensive ablation studies confirmed the relative contribution of each component in TimeDRL's architecture, and semi-supervised learning evaluations demonstrated its effectiveness in real-world scenarios, even with limited labeled data. The code is available at <a class="link-external link-https" href="https://github.com/blacksnail789521/TimeDRL" rel="external noopener nofollow">this https URL</a>.

Deep temporal representation learning for language identification

Phonetic Temporal Neural Model for Language Identification

High-resolution Acoustic Modeling and Compact Language Modeling of Language-Universal Speech Attributes for Spoken Language Identification.

Two-stage Training for Chinese Dialect Recognition

Transducer-based language embedding for spoken language identification

Learnable Spectro-temporal Receptive Fields for Robust Voice Type Discrimination

Deep joint learning for language recognition

Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network

A Deep Multi-Task Representation Learning Method for Time Series Classification and Retrieval.

Generative linguistic representation for spoken language identification

Language Recognition using Time Delay Deep Neural Network

Deep LSTM for Large Vocabulary Continuous Speech Recognition

[Modification of empirical antimicrobial regimen during the first 72 hours of hospitalisation].

An Empirical Study of Language Model Integration for Transducer Based Speech Recognition

Improved deep speaker feature learning for text-dependent speaker recognition

Speech Guided Disentangled Visual Representation Learning for Lip Reading

Streaming Language Identification using Combination of Acoustic Representations and ASR Hypotheses

OLR 2021 Challenge: Datasets, Rules and Baselines

TimeDRL: Disentangled Representation Learning for Multivariate Time-Series

Deep Discriminative Feature Learning for Accent Recognition

OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer