Spectral State Space Models

Naman Agarwal,Daniel Suo,Xinyi Chen,Elad Hazan
2024-07-11
Abstract:This paper studies sequence modeling for prediction tasks with long range dependencies. We propose a new formulation for state space models (SSMs) based on learning linear dynamical systems with the spectral filtering algorithm (Hazan et al. (2017)). This gives rise to a novel sequence prediction architecture we call a spectral state space model. Spectral state space models have two primary advantages. First, they have provable robustness properties as their performance depends on neither the spectrum of the underlying dynamics nor the dimensionality of the problem. Second, these models are constructed with fixed convolutional filters that do not require learning while still outperforming SSMs in both theory and practice. The resulting models are evaluated on synthetic dynamical systems and long-range prediction tasks of various modalities. These evaluations support the theoretical benefits of spectral filtering for tasks requiring very long range memory.
Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of effective sequence modeling and prediction in tasks involving long-range dependencies. Traditional Recurrent Neural Networks (RNNs), while suitable for handling sequential data, face challenges such as training difficulties, gradient vanishing or exploding, especially when dealing with long sequences. In recent years, although Transformer models have achieved significant success in various fields, their attention mechanism's memory and computational requirements grow quadratically with the context length, limiting their application in long-sequence tasks. Therefore, this paper proposes a new state space model (SSMs) based on spectral filtering algorithm, namely the Spectral State Space Model, to address these issues. Specifically, the main contributions of the paper include: 1. **Proposing a new sequence prediction architecture**: By projecting the input sequence into a small subspace constructed with a special structure, using spectral filtering techniques to model linear dynamic systems, thus achieving efficient long-range dependency modeling. 2. **Theoretical advantages**: The Spectral State Space Model has provable robustness, and its performance does not depend on the spectral properties of the underlying dynamic system or the dimensionality of the problem. Moreover, these models use fixed convolutional filters that do not require learning but still outperform traditional SSMs. 3. **Experimental validation**: Evaluations were conducted on synthetic dynamic systems and various long-range prediction tasks across different modalities, with results supporting the theoretical advantages of spectral filtering in tasks requiring very long memory. Overall, the paper aims to provide a more efficient and stable solution for sequence modeling tasks with long-range dependencies by introducing the Spectral State Space Model.