Abstract:Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions. However, their high memory and computing requirements pose a critical bottleneck for long-term forecasting. To address this, we propose TSMixer, a lightweight neural architecture exclusively composed of multi-layer perceptron (MLP) modules for multivariate forecasting and representation learning on patched time series. Inspired by MLP-Mixer's success in computer vision, we adapt it for time series, addressing challenges and introducing validated components for enhanced accuracy. This includes a novel design paradigm of attaching online reconciliation heads to the MLP-Mixer backbone, for explicitly modeling the time-series properties such as hierarchy and channel-correlations. We also propose a novel Hybrid channel modeling and infusion of a simple gating approach to effectively handle noisy channel interactions and generalization across diverse datasets. By incorporating these lightweight components, we significantly enhance the learning capability of simple MLP structures, outperforming complex Transformer models with minimal computing usage. Moreover, TSMixer's modular design enables compatibility with both supervised and masked self-supervised learning methods, making it a promising building block for time-series Foundation Models. TSMixer outperforms state-of-the-art MLP and Transformer models in forecasting by a considerable margin of 8-60%. It also outperforms the latest strong benchmarks of Patch-Transformer models (by 1-2%) with a significant reduction in memory and runtime (2-3X). The source code of our model is officially released as PatchTSMixer in the HuggingFace. Model: <a class="link-external link-https" href="https://huggingface.co/docs/transformers/main/en/model_doc/patchtsmixer" rel="external noopener nofollow">this https URL</a> Examples: <a class="link-external link-https" href="https://github.com/ibm/tsfm/#notebooks-links" rel="external noopener nofollow">this https URL</a>

Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking

iMixer: hierarchical Hopfield network implies an invertible, implicit and iterative MLP-Mixer

SCHEME: Scalable Channel Mixer for Vision Transformers

HyperMixer: An MLP-based Low Cost Alternative to Transformers

Transformer Vs. MLP-Mixer: Exponential Expressive Gap For NLP Problems

Rethinking Token-Mixing MLP for MLP-based Vision Backbone

Meta-Transformer: A Unified Framework for Multimodal Learning

TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Demystify Transformers & Convolutions in Modern Image Deep Networks

CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing

Metaformer: A Transformer That Tends to Mine Metaphorical-Level Information

DynaMixer: A Vision MLP Architecture with Dynamic Mixing.

Mnemosyne: Learning to Train Transformers with Transformers

AMixer: Adaptive Weight Mixing for Self-attention Free Vision Transformers.

EmbedFormer: Embedded Depth-Wise Convolution Layer for Token Mixing

MLP Can Be A Good Transformer Learner

Masked Mixers for Language Generation and Retrieval

Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

S$^2$-MLP: Spatial-Shift MLP Architecture for Vision

NiNformer: A Network in Network Transformer with Token Mixing as a Gating Function Generator