Abstract:<p>Traffic flow prediction is an important component for self-driving. Traffic flow is closely related to population distribution, and the traffic flow is not only related to the absolute number of human population but also to their concerns and interests. Accurate spatio-temporal web traffic flow prediction is critical in many applications, such as bandwidth allocation, anomaly detection, congestion control and admission control. Most existing traffic flow prediction methods use models based on time-series analysis and remain inadequate for many real-world applications. Web traffic flow is found to be strongly associated with the spatio-temporal distribution of the population. Increasingly, it is critical to understand and make decisions based on the relationship between population patterns and web traffic flow patterns. It has been proven that different people have different responses to web events. Due to the complexity of spatial data structures and the huge volume of web traffic flow log data, it is difficult to routinely find the relationship between web events and population distributions without an appropriate processing framework. In this paper, we propose an innovative framework named GeoTrafficPredict to support the accurate spatio-temporal prediction of web traffic flow. GeoTrafficPredict provides a machine learning platform to learn the spatio-temporal pattern of traffic flow and use the pattern to predict the trend in both spatial and temporal dimension. Also, GeoTrafficPredict provide data aggregation portal and cloud-based computation function. GeoTrafficPredict deploys a series of computational images in a cloud computing environment, and the implementation on China's CSTNET illustrates the performance of our platform.</p>
What problem does this paper attempt to address?
The paper primarily focuses on addressing the issue of network traffic prediction, especially in terms of making accurate predictions of network traffic in the spatio-temporal dimensions. Specifically, the paper introduces a machine learning system named GeoTraPredict, which aims to enhance the accuracy of predictions by analyzing the relationships between population distribution, network events, and network traffic. Network traffic prediction is crucial for applications such as autonomous driving, route planning in smart cities, bandwidth allocation, anomaly detection, congestion control, and access control.
The paper points out that most existing traffic prediction methods are based on time series analysis, but they fall short in practical applications. Network traffic is closely related to population distribution, influenced not only by the number of people but also by their interests and activities. Therefore, understanding and making decisions based on the relationship between population patterns and network traffic patterns is becoming increasingly important. The proposed method combines population and network event clusters, explores the correlation between network traffic and population categories, and conducts a comprehensive analysis. By learning the relationship between traffic variations and network events, as well as the characteristics of internet users, it is possible to predict fluctuations in network traffic, thereby taking appropriate measures in advance.
The paper provides a detailed introduction to the GeoTraPredict framework, including data collection, model training, the prediction process, and model integration on the cloud platform. The framework utilizes a spatio-temporal cube model for data organization and storage, combining the advantages of distributed databases and traditional databases to handle large volumes of data and real-time data streams. Additionally, the paper discusses how to explore the patterns of spatio-temporal data changes under different event stimuli through various models and techniques (such as event extraction models, time series models, spatial analysis models), providing support for applications like congestion control. Lastly, the paper mentions solutions for implementing model integration on the cloud platform to meet the needs of resource management, compress analysis time, and improve prediction efficiency.