Tuning Frequency Bias of State Space Models

Annan Yu,Dongwei Lyu,Soon Hoe Lim,Michael W. Mahoney,N. Benjamin Erichson

2024-10-03

Abstract:State space models (SSMs) leverage linear, time-invariant (LTI) systems to effectively learn sequences with long-range dependencies. By analyzing the transfer functions of LTI systems, we find that SSMs exhibit an implicit bias toward capturing low-frequency components more effectively than high-frequency ones. This behavior aligns with the broader notion of frequency bias in deep learning model training. We show that the initialization of an SSM assigns it an innate frequency bias and that training the model in a conventional way does not alter this bias. Based on our theory, we propose two mechanisms to tune frequency bias: either by scaling the initialization to tune the inborn frequency bias; or by applying a Sobolev-norm-based filter to adjust the sensitivity of the gradients to high-frequency inputs, which allows us to change the frequency bias via training. Using an image-denoising task, we empirically show that we can strengthen, weaken, or even reverse the frequency bias using both mechanisms. By tuning the frequency bias, we can also improve SSMs' performance on learning long-range sequences, averaging an 88.26% accuracy on the Long-Range Arena (LRA) benchmark tasks.

Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is **the frequency bias problem in state - space models (SSMs) when processing long - range dependent sequence data**. Specifically, the paper points out that SSMs tend to capture low - frequency components more effectively while ignoring high - frequency components, and this behavior is consistent with the concept of frequency bias in deep - learning model training. The paper analyzes the transfer function of linear time - invariant systems (LTI) and finds that the initialization of SSMs endows them with an inherent frequency bias, and normal training will not change this bias. To solve this problem, the paper proposes two mechanisms to adjust the frequency bias: 1. **Adjust the inherent frequency bias by scaling the initialization**: By adjusting the scale factor α of the initialization parameter, the learning ability of SSMs for different frequency components can be changed. 2. **Apply a Sobolev - norm - based filter**: By introducing a weight factor (1 + |s|)^β, the sensitivity of the gradient to high - frequency input can be adjusted, thereby changing the frequency bias during the training process. Through these methods, the paper shows how to enhance, weaken or even reverse the frequency bias of SSMs, and verifies the effectiveness of these methods in the image denoising task. In addition, adjusting the frequency bias can also improve the performance of SSMs in processing long - range dependent sequence tasks, for example, achieving an average accuracy of 88.26% on the Long - Range Arena (LRA) benchmark task.

Tuning Frequency Bias of State Space Models

Towards a theory of learning dynamics in deep state space models

Spectral State Space Models

From Generalization Analysis to Optimization Designs for State Space Models

Self-Organizing State-Space Models with Artificial Dynamics

Deep Learning-based Approaches for State Space Models: A Selective Review

Deep State Space Models for Nonlinear System Identification

HOPE for a Robust Parameterization of Long-memory State Space Models

StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization

Parameter-Efficient Fine-Tuning of State Space Models

Autocorrelation Matters: Understanding the Role of Initialization Schemes for State Space Models

SMR: State Memory Replay for Long Sequence Modeling

Robustifying State-space Models for Long Sequences via Approximate Diagonalization

Coupling LSTM neural networks and state-space models through analytically tractable inference

SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models

Nudging state-space models for Bayesian filtering under misspecified dynamics

Time-SSM: Simplifying and Unifying State Space Models for Time Series Forecasting

Theoretical Foundations of Deep Selective State-Space Models

Kalman-SSM: Modeling Long-Term Time Series With Kalman Filter Structured State Spaces

SPikE-SSM: A Sparse, Precise, and Efficient Spiking State Space Model for Long Sequences Learning

How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections