Abstract:In this paper we present a new framework for time-series modeling that combines the best of traditional statistical models and neural networks. We focus on time-series with long-range dependencies, needed for monitoring fine granularity data (e.g. minutes, seconds, milliseconds), prevalent in operational use-cases. Traditional models, such as auto-regression fitted with least squares (Classic-AR) can model time-series with a concise and interpretable model. When dealing with long-range dependencies, Classic-AR models can become intractably slow to fit for large data. Recently, sequence-to-sequence models, such as Recurrent Neural Networks, which were originally intended for natural language processing, have become popular for time-series. However, they can be overly complex for typical time-series data and lack interpretability. A scalable and interpretable model is needed to bridge the statistical and deep learning-based approaches. As a first step towards this goal, we propose modelling AR-process dynamics using a feed-forward neural network approach, termed AR-Net. We show that AR-Net is as interpretable as Classic-AR but also scales to long-range dependencies. Our results lead to three major conclusions: First, AR-Net learns identical AR-coefficients as Classic-AR, thus being equally interpretable. Second, the computational complexity with respect to the order of the AR process, is linear for AR-Net as compared to a quadratic for Classic-AR. This makes it possible to model long-range dependencies within fine granularity data. Third, by introducing regularization, AR-Net automatically selects and learns sparse AR-coefficients. This eliminates the need to know the exact order of the AR-process and allows to learn sparse weights for a model with long-range dependencies.

FNetAR: Mixing Tokens with Autoregressive Fourier Transforms

Transformer Neural Autoregressive Flows

Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition

Autoregressive Moving-average Attention Mechanism for Time Series Forecasting

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier Layers

Investigating the Role of Feed-Forward Networks in Transformers Using Parallel Attention and Feed-Forward Net Design

AR-Net: A simple Auto-Regressive Neural Network for time-series

New Approaches to Long Document Summarization: Fourier Transform Based Attention in a Transformer Model

A CTC Alignment-based Non-autoregressive Transformer for End-to-end Automatic Speech Recognition

Speeding up Transformer Decoding via an Attention Refinement Network.

Accelerating Transformer for Neural Machine Translation.

AbFTNet: An Efficient Transformer Network with Alignment before Fusion for Multimodal Automatic Modulation Recognition

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

FL-Net: A multi-scale cross-decomposition network with frequency external attention for long-term time series forecasting

End-to-End ASR with Adaptive Span Self-Attention

Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input

Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers

A Lexical-aware Non-autoregressive Transformer-based ASR Model

LongFNT: Long-form Speech Recognition with Factorized Neural Transducer