Insights into LSTM Fully Convolutional Networks for Time Series Classification

Fazle Karim,Somshubra Majumdar,Houshang Darabi
DOI: https://doi.org/10.1109/ACCESS.2019.2916828
2019-07-02
Abstract:Long Short Term Memory Fully Convolutional Neural Networks (LSTM-FCN) and Attention LSTM-FCN (ALSTM-FCN) have shown to achieve state-of-the-art performance on the task of classifying time series signals on the old University of California-Riverside (UCR) time series repository. However, there has been no study on why LSTM-FCN and ALSTM-FCN perform well. In this paper, we perform a series of ablation tests (3627 experiments) on LSTM-FCN and ALSTM-FCN to provide a better understanding of the model and each of its sub-module. Results from the ablation tests on ALSTM-FCN and LSTM-FCN show that the LSTM and the FCN blocks perform better when applied in a conjoined manner. Two z-normalizing techniques, z-normalizing each sample independently and z-normalizing the whole dataset, are compared using a Wilcoxson signed-rank test to show a statistical difference in performance. In addition, we provide an understanding of the impact dimension shuffle has on LSTM-FCN by comparing its performance with LSTM-FCN when no dimension shuffle is applied. Finally, we demonstrate the performance of the LSTM-FCN when the LSTM block is replaced by a GRU, basic RNN, and Dense Block.
Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is: why do LSTM-FCN (Long Short-Term Memory Fully Convolutional Network) and ALSTM-FCN (Attention Long Short-Term Memory Fully Convolutional Network) perform well in time series classification tasks. Specifically, the authors conduct a series of ablation tests to analyze the various sub-modules of LSTM-FCN and ALSTM-FCN in detail to understand why these models achieve the current best performance. ### Main Research Content: 1. **Ablation Tests**: The authors conducted 3627 experiments to systematically evaluate the impact of each sub-module of LSTM-FCN and ALSTM-FCN on overall performance. 2. **Normalization Techniques Comparison**: The performance differences between two z-normalization techniques (independent normalization for each sample and normalization for the entire dataset) were compared, and the Wilcoxon signed-rank test was used to verify statistical significance. 3. **Impact of Dimension Shuffle**: The impact of dimension shuffle on the performance of LSTM-FCN was explored, comparing the cases with and without dimension shuffle. 4. **Performance of Alternative Modules**: The performance changes were studied when replacing the LSTM block with GRU, basic RNN, and dense blocks. ### Research Background: - Time series classification is a widely studied field involving various practical application scenarios such as weather forecasting, stock market data, EEG/ECG, etc. - LSTM-FCN and ALSTM-FCN are among the best-performing models on the UCR time series classification benchmark dataset, but their internal mechanisms have not been fully explained. ### Research Methods: - **Dataset**: Experiments were conducted using the UCR time series classification benchmark dataset. - **Model Structure**: The same structure as the original model was maintained, and the optimal number of LSTM units was found through grid search. - **Training Strategy**: The Adam optimizer was used for gradient descent, with an initial learning rate set to 1e-3, and the learning rate was gradually reduced when the training loss no longer improved. - **Evaluation Metrics**: The main evaluation metrics included classification accuracy and mean per-class error (MPCE). ### Research Results: - **Synergy of LSTM and FCN Blocks**: Experimental results showed that the performance is better when LSTM and FCN blocks are used together. - **Normalization Techniques**: z-normalization for the entire dataset performed better than independent normalization for each sample. - **Dimension Shuffle**: LSTM-FCN with dimension shuffle outperformed the version without dimension shuffle in most cases. - **Alternative Modules**: Replacing the LSTM block with other types of recurrent neural network modules resulted in a performance decline but still remained competitive. ### Conclusion: - Through detailed ablation tests, the authors revealed the specific contributions and synergy mechanisms of each sub-module of LSTM-FCN and ALSTM-FCN. - These findings help to better understand and optimize these models, providing valuable references for future time series classification research.