Sequence Modeling with Multiresolution Convolutional Memory

Jiaxin Shi,Ke Alexander Wang,Emily B. Fox
2023-11-02
Abstract:Efficiently capturing the long-range patterns in sequential data sources salient to a given task -- such as classification and generative modeling -- poses a fundamental challenge. Popular approaches in the space tradeoff between the memory burden of brute-force enumeration and comparison, as in transformers, the computational burden of complicated sequential dependencies, as in recurrent neural networks, or the parameter burden of convolutional networks with many or large filters. We instead take inspiration from wavelet-based multiresolution analysis to define a new building block for sequence modeling, which we call a MultiresLayer. The key component of our model is the multiresolution convolution, capturing multiscale trends in the input sequence. Our MultiresConv can be implemented with shared filters across a dilated causal convolution tree. Thus it garners the computational advantages of convolutional networks and the principled theoretical motivation of wavelet decompositions. Our MultiresLayer is straightforward to implement, requires significantly fewer parameters, and maintains at most a $\mathcal{O}(N\log N)$ memory footprint for a length $N$ sequence. Yet, by stacking such layers, our model yields state-of-the-art performance on a number of sequence classification and autoregressive density estimation tasks using CIFAR-10, ListOps, and PTB-XL datasets.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper proposes a new sequence modeling method aimed at efficiently capturing long-range patterns in sequence data that are important for specific tasks (such as classification, generative modeling, etc.). The main contributions include: 1. **Multi-Resolution Convolution Layer (MULTIRES LAYER)**: Inspired by wavelet multi-resolution analysis (MRA), the authors designed a new sequence modeling building block—MULTIRES LAYER, whose core is multi-resolution convolution (MULTIRES CONV). This convolution can capture trends at different time scales in the input sequence. 2. **Efficient Memory Mechanism**: Through multi-resolution convolution operations, memory about past data can be constructed at each time step. To maintain computational efficiency and parameter count, the paper proposes the TREESELECT mechanism to selectively retain part of the representation coefficients as memory vectors. 3. **Theoretical Foundation**: When the filters are set to predefined wavelet filters, multi-resolution convolution can degenerate into traditional discrete wavelet transform. However, the model in the paper allows these filters to be learnable, enabling the model to surpass manually designed wavelet filters. 4. **Simple and Powerful Architecture**: MULTIRES LAYER is built on simple dilated causal convolutions and linear transformations, making it easy to parallelize and parameter-efficient. Additionally, due to its multi-resolution structure, the model is theoretically interpretable. 5. **Experimental Results**: The paper conducts experimental evaluations on multiple sequence classification and autoregressive density estimation tasks, including CIFAR-10 image sequence classification, list operation prediction on ListOps, and multi-label classification of electrocardiograms on the PTB-XL dataset. The experimental results show that the proposed model achieves state-of-the-art performance on these tasks. In summary, this paper aims to address the problem of capturing long-range dependencies in sequence modeling by introducing a new architecture based on multi-resolution analysis and demonstrates the effectiveness and superiority of this approach.