Abstract:Long Short Term Memory Fully Convolutional Neural Networks (LSTM-FCN) and Attention LSTM-FCN (ALSTM-FCN) have shown to achieve state-of-the-art performance on the task of classifying time series signals on the old University of California-Riverside (UCR) time series repository. However, there has been no study on why LSTM-FCN and ALSTM-FCN perform well. In this paper, we perform a series of ablation tests (3627 experiments) on LSTM-FCN and ALSTM-FCN to provide a better understanding of the model and each of its sub-module. Results from the ablation tests on ALSTM-FCN and LSTM-FCN show that the LSTM and the FCN blocks perform better when applied in a conjoined manner. Two z-normalizing techniques, z-normalizing each sample independently and z-normalizing the whole dataset, are compared using a Wilcoxson signed-rank test to show a statistical difference in performance. In addition, we provide an understanding of the impact dimension shuffle has on LSTM-FCN by comparing its performance with LSTM-FCN when no dimension shuffle is applied. Finally, we demonstrate the performance of the LSTM-FCN when the LSTM block is replaced by a GRU, basic RNN, and Dense Block.

What problem does this paper attempt to address?

The problem this paper attempts to address is: why do LSTM-FCN (Long Short-Term Memory Fully Convolutional Network) and ALSTM-FCN (Attention Long Short-Term Memory Fully Convolutional Network) perform well in time series classification tasks. Specifically, the authors conduct a series of ablation tests to analyze the various sub-modules of LSTM-FCN and ALSTM-FCN in detail to understand why these models achieve the current best performance. ### Main Research Content: 1. **Ablation Tests**: The authors conducted 3627 experiments to systematically evaluate the impact of each sub-module of LSTM-FCN and ALSTM-FCN on overall performance. 2. **Normalization Techniques Comparison**: The performance differences between two z-normalization techniques (independent normalization for each sample and normalization for the entire dataset) were compared, and the Wilcoxon signed-rank test was used to verify statistical significance. 3. **Impact of Dimension Shuffle**: The impact of dimension shuffle on the performance of LSTM-FCN was explored, comparing the cases with and without dimension shuffle. 4. **Performance of Alternative Modules**: The performance changes were studied when replacing the LSTM block with GRU, basic RNN, and dense blocks. ### Research Background: - Time series classification is a widely studied field involving various practical application scenarios such as weather forecasting, stock market data, EEG/ECG, etc. - LSTM-FCN and ALSTM-FCN are among the best-performing models on the UCR time series classification benchmark dataset, but their internal mechanisms have not been fully explained. ### Research Methods: - **Dataset**: Experiments were conducted using the UCR time series classification benchmark dataset. - **Model Structure**: The same structure as the original model was maintained, and the optimal number of LSTM units was found through grid search. - **Training Strategy**: The Adam optimizer was used for gradient descent, with an initial learning rate set to 1e-3, and the learning rate was gradually reduced when the training loss no longer improved. - **Evaluation Metrics**: The main evaluation metrics included classification accuracy and mean per-class error (MPCE). ### Research Results: - **Synergy of LSTM and FCN Blocks**: Experimental results showed that the performance is better when LSTM and FCN blocks are used together. - **Normalization Techniques**: z-normalization for the entire dataset performed better than independent normalization for each sample. - **Dimension Shuffle**: LSTM-FCN with dimension shuffle outperformed the version without dimension shuffle in most cases. - **Alternative Modules**: Replacing the LSTM block with other types of recurrent neural network modules resulted in a performance decline but still remained competitive. ### Conclusion: - Through detailed ablation tests, the authors revealed the specific contributions and synergy mechanisms of each sub-module of LSTM-FCN and ALSTM-FCN. - These findings help to better understand and optimize these models, providing valuable references for future time series classification research.

Insights into LSTM Fully Convolutional Networks for Time Series Classification

Multivariate LSTM-FCNs for time series classification

LSTM-MFCN: A time series classifier based on multi-scale spatial–temporal features

A Comparative Study of Detecting Anomalies in Time Series Data Using LSTM and TCN Models

Effective LSTMs with Seasonal-Trend Decomposition and Adaptive Learning and Niching-Based Backtracking Search Algorithm for Time Series Forecasting

Attention-based LSTM-CNNs for Time-series Classification.

Attention Based CNN-LSTM Network for Anomaly Pattern Classification of Multivariate Time Series

Bidirectional LSTM-RNN-based Hybrid Deep Learning Frameworks for Univariate Time Series Classification

Improving Time Series Classification Algorithms Using Octave-Convolutional Layers

Time Series Sequences Classification with Inception and LSTM Module

A survey on long short-term memory networks for time series prediction

Deep Gated Recurrent and Convolutional Network Hybrid Model for Univariate Time Series Classification

Combining contextual neural networks for time series classification

Time Series Prediction Based on LSTM and High-Order Fuzzy Cognitive Map with Attention Mechanism

Deep Learning with Long Short-Term Memory for Time Series Prediction

End-to-end Multivariate Time Series Classification Via Hybrid Deep Learning Architectures

CTS-LSTM: LSTM-based Neural Networks for Correlatedtime Series Prediction

NOA-LSTM: An Efficient LSTM cell architecture for Time Series forecasting

CTFNet: Long-Sequence Time-Series Forecasting Based on Convolution and Time–Frequency Analysis

RNTS: Robust Neural Temporal Search for Time Series Classification