Tuning Frequency Bias of State Space Models

Annan Yu,Dongwei Lyu,Soon Hoe Lim,Michael W. Mahoney,N. Benjamin Erichson
2024-10-03
Abstract:State space models (SSMs) leverage linear, time-invariant (LTI) systems to effectively learn sequences with long-range dependencies. By analyzing the transfer functions of LTI systems, we find that SSMs exhibit an implicit bias toward capturing low-frequency components more effectively than high-frequency ones. This behavior aligns with the broader notion of frequency bias in deep learning model training. We show that the initialization of an SSM assigns it an innate frequency bias and that training the model in a conventional way does not alter this bias. Based on our theory, we propose two mechanisms to tune frequency bias: either by scaling the initialization to tune the inborn frequency bias; or by applying a Sobolev-norm-based filter to adjust the sensitivity of the gradients to high-frequency inputs, which allows us to change the frequency bias via training. Using an image-denoising task, we empirically show that we can strengthen, weaken, or even reverse the frequency bias using both mechanisms. By tuning the frequency bias, we can also improve SSMs' performance on learning long-range sequences, averaging an 88.26% accuracy on the Long-Range Arena (LRA) benchmark tasks.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **the frequency bias problem in state - space models (SSMs) when processing long - range dependent sequence data**. Specifically, the paper points out that SSMs tend to capture low - frequency components more effectively while ignoring high - frequency components, and this behavior is consistent with the concept of frequency bias in deep - learning model training. The paper analyzes the transfer function of linear time - invariant systems (LTI) and finds that the initialization of SSMs endows them with an inherent frequency bias, and normal training will not change this bias. To solve this problem, the paper proposes two mechanisms to adjust the frequency bias: 1. **Adjust the inherent frequency bias by scaling the initialization**: By adjusting the scale factor α of the initialization parameter, the learning ability of SSMs for different frequency components can be changed. 2. **Apply a Sobolev - norm - based filter**: By introducing a weight factor (1 + |s|)^β, the sensitivity of the gradient to high - frequency input can be adjusted, thereby changing the frequency bias during the training process. Through these methods, the paper shows how to enhance, weaken or even reverse the frequency bias of SSMs, and verifies the effectiveness of these methods in the image denoising task. In addition, adjusting the frequency bias can also improve the performance of SSMs in processing long - range dependent sequence tasks, for example, achieving an average accuracy of 88.26% on the Long - Range Arena (LRA) benchmark task.