Flipped Classroom: Effective Teaching for Time Series Forecasting

Philipp Teutsch,Patrick Mäder
DOI: https://doi.org/10.48550/arXiv.2210.08959
2022-10-17
Abstract:Sequence-to-sequence models based on LSTM and GRU are a most popular choice for forecasting time series data reaching state-of-the-art performance. Training such models can be delicate though. The two most common training strategies within this context are teacher forcing (TF) and free running (FR). TF can be used to help the model to converge faster but may provoke an exposure bias issue due to a discrepancy between training and inference phase. FR helps to avoid this but does not necessarily lead to better results, since it tends to make the training slow and unstable instead. Scheduled sampling was the first approach tackling these issues by picking the best from both worlds and combining it into a curriculum learning (CL) strategy. Although scheduled sampling seems to be a convincing alternative to FR and TF, we found that, even if parametrized carefully, scheduled sampling may lead to premature termination of the training when applied for time series forecasting. To mitigate the problems of the above approaches we formalize CL strategies along the training as well as the training iteration scale. We propose several new curricula, and systematically evaluate their performance in two experimental sets. For our experiments, we utilize six datasets generated from prominent chaotic systems. We found that the newly proposed increasing training scale curricula with a probabilistic iteration scale curriculum consistently outperforms previous training strategies yielding an NRMSE improvement of up to 81% over FR or TF training. For some datasets we additionally observe a reduced number of training iterations. We observed that all models trained with the new curricula yield higher prediction stability allowing for longer prediction horizons.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the defects existing in common training strategies (such as Teacher Forcing, TF, and Free Running, FR) in time - series prediction, especially the deficiencies shown by these strategies when dealing with time - series data of chaotic systems. Specifically: 1. **Teacher Forcing (TF)**: Although it can accelerate the model convergence speed, there are differences in the data distribution during the training and inference stages, which leads to the exposure bias of the model when facing its own predicted values. That is, the model has never been exposed to its own wrong predictions during the training process, so it has a low tolerance for small errors in practical applications, which limits its performance in the long - prediction range. 2. **Free Running (FR)**: Although it can avoid exposure bias and improve the robustness of the model, it will cause the training process to be slow and unstable and may not reach the best performance. 3. **Limitations of existing solutions**: For example, Scheduled Sampling attempts to combine the advantages of TF and FR, but when applied to time - series prediction, even if the parameters are set properly, it may lead to premature termination of training. To solve the above problems, the author proposes a new Curriculum Learning (CL) strategy. By dynamically adjusting the proportion of TF and FR during the training process, it aims to improve the prediction stability and accuracy of the model, especially when dealing with data from chaotic systems. The main contributions of the paper are: - Proposing a series of new curriculum learning strategies and systematically evaluating their performance on different datasets. - Discovering that the newly proposed Increasing Training Scale Curricula combined with the Probabilistic Iteration Scale Curriculum can significantly outperform the traditional TF and FR training methods, with the NRMSE (Normalized Root Mean Square Error) improved by up to 81%. - The new strategy also reduces the number of training iterations and improves the prediction stability of the model, allowing for longer - term predictions. In conclusion, this paper improves the prediction performance and stability of the model by improving the training strategy of the time - series prediction model, especially for the time - series data of chaotic systems.