Abstract:Transformer-based models have traditionally been the primary focus of research for addressing time series forecasting challenges. However, the emergence of recently introduced high-performance linear models has cast doubt upon the effectiveness of transformer architecture in time series forecasting tasks. Throughout, most Transformer variants have represented time series using time point-wise tokenization, which does not provide sufficient semantic information for the attention mechanism. PatchTST expands the receptive field through patch-wise tokenization, mitigating the problem of inadequate information. However, when confronted with multivariate time series forecasting tasks, it does not consider the potential impact of delays and correlation between variates on prediction performance. The recently proposed iTransformer addresses the issue of misalignment between variates by employing series-wise tokenization, yet its embedding method is limited to shallow temporal feature representation. In this work, we propose the Temporal Feature Enhanced Transformer (TFEformer), which deeply integrates patch-wise and series-wise tokenization to enhance the temporal representation of multivariate tokens. Furthermore, we introduce a multi-scale patch fusion mechanism capable of capturing and adaptively integrating temporal features across multiple resolutions. We also enhanced the FFN module to serve as a temporal feature extractor and introduced variate-wise attention to capture the correlations between variables. Extensive experiments on eight real-world datasets have demonstrated that TFEformer outperforms all existing models, achieving state-of-the-art performance. Through experiments, we have also shown that TFEformer improves transformer-based models with superior generalization ability, better utilization of extended lookback windows, and effective suppression of distribution shifts.

TATM: Task-Adaptive Token Matching for Few-Shot Transformer

TFEformer: Temporal Feature Enhanced Transformer for Multivariate Time Series Forecasting

Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning

Feature Transformation for Few-Shot Learning

Multi-level adaptive few-shot learning network combined with vision transformer

Exploring Efficient Few-shot Adaptation for Vision Transformers

Few-Shot Learning via Embedding Adaptation With Set-to-Set Functions

Task-Specific Alignment and Multiple Level Transformer for Few-Shot Action Recognition

Task-specific alignment and multiple-level transformer for few-shot action recognition

Supervised Contrastive Representation Embedding Based on Transformer for Few-Shot Classification

Learning Embedding Adaptation for Few-Shot Learning

Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning

tSF: Transformer-based Semantic Filter for Few-Shot Learning

Task-Adaptive Feature Transformer for Few-Shot Segmentation

Task-Adaptive Feature Transformer with Semantic Enrichment for Few-Shot Segmentation

Task-Adapter: Task-specific Adaptation of Image Models for Few-shot Action Recognition

Supervised Masked Knowledge Distillation for Few-Shot Transformers

Siamese Transformer Networks for Few-shot Image Classification

Task-aware prototype refinement for improved few-shot learning

Transductive Episodic-Wise Adaptive Metric for Few-Shot Learning

Sparse Spatial Transformers for Few-Shot Learning