Overcoming Data Limitations in Internet Traffic Forecasting: LSTM Models with Transfer Learning and Wavelet Augmentation

Sajal Saha,Anwar Haque,Greg Sidebottom
2024-09-20
Abstract:Effective internet traffic prediction in smaller ISP networks is challenged by limited data availability. This paper explores this issue using transfer learning and data augmentation techniques with two LSTM-based models, LSTMSeq2Seq and LSTMSeq2SeqAtn, initially trained on a comprehensive dataset provided by Juniper Networks and subsequently applied to smaller datasets. The datasets represent real internet traffic telemetry, offering insights into diverse traffic patterns across different network domains. Our study revealed that while both models performed well in single-step predictions, multi-step forecasts were challenging, particularly in terms of long-term accuracy. In smaller datasets, LSTMSeq2Seq generally outperformed LSTMSeq2SeqAtn, indicating that higher model complexity does not necessarily translate to better performance. The models' effectiveness varied across different network domains, reflecting the influence of distinct traffic characteristics. To address data scarcity, Discrete Wavelet Transform was used for data augmentation, leading to significant improvements in model performance, especially in shorter-term forecasts. Our analysis showed that data augmentation is crucial in scenarios with limited data. Additionally, the study included an analysis of the models' variability and consistency, with attention mechanisms in LSTMSeq2SeqAtn providing better short-term forecasting consistency but greater variability in longer forecasts. The results highlight the benefits and limitations of different modeling approaches in traffic prediction. Overall, this research underscores the importance of transfer learning and data augmentation in enhancing the accuracy of traffic prediction models, particularly in smaller ISP networks with limited data availability.
Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper primarily aims to address the issue of data limitations in small Internet Service Provider (ISP) networks to improve the accuracy of internet traffic forecasting. Specifically, the paper tackles this problem through the following points: 1. **Data Scarcity Challenge**: - Real-world internet traffic data is diverse and complex, making it very difficult to train accurate predictive models on small datasets. Traditional statistical methods (such as ARIMA, SARIMA, etc.) struggle to handle nonlinear features, while machine learning and deep learning methods, although performing better, require large amounts of historical data. 2. **Application of Transfer Learning**: - Utilizing knowledge transfer from models trained on large-scale datasets to small datasets to improve prediction performance. This approach is particularly suitable for large ISPs managing diverse networks, as obtaining large datasets is often impractical. 3. **Data Augmentation Techniques**: - Using Discrete Wavelet Transform (DWT) for data augmentation to expand the size of the target domain dataset. This helps alleviate the shortcomings of small datasets and enhances the model's generalization ability. 4. **Multi-Target Domain Prediction**: - Developing personalized predictive models for different network segments (such as residential areas, commercial areas, or educational areas). This approach not only simplifies the model development process but also reduces the time and resources required to establish and deploy independent models. Through these methods, the paper aims to explore the effectiveness of combining transfer learning and data augmentation techniques and proposes a systematic framework to determine the minimum dataset size in the target domain that can benefit from transfer learning. This approach provides new insights into solving time series forecasting problems in small ISP networks.