Towards Edge-Based Data Lake Architecture for Intelligent Transportation System

Danilo Fernandes,Douglas L. L. Moura,Gean Santos,Geymerson S. Ramos,Fabiane Queiroz,Andre L. L. Aquino
DOI: https://doi.org/10.1145/3616394.3618270
2024-09-04
Abstract:The rapid urbanization growth has underscored the need for innovative solutions to enhance transportation efficiency and safety. Intelligent Transportation Systems (ITS) have emerged as a promising solution in this context. However, analyzing and processing the massive and intricate data generated by ITS presents significant challenges for traditional data processing systems. This work proposes an Edge-based Data Lake Architecture to integrate and analyze the complex data from ITS efficiently. The architecture offers scalability, fault tolerance, and performance, improving decision-making and enhancing innovative services for a more intelligent transportation ecosystem. We demonstrate the effectiveness of the architecture through an analysis of three different use cases: (i) Vehicular Sensor Network, (ii) Mobile Network, and (iii) Driver Identification applications.
Databases,Artificial Intelligence,Networking and Internet Architecture
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the efficient processing and analysis of big data in Intelligent Transportation Systems (ITS). With the acceleration of the urbanization process, the efficiency and safety of transportation systems face many challenges, such as air pollution, noise, traffic congestion and accidents. Intelligent Transportation Systems (ITS) support various applications and services, such as real - time traffic management, road safety, intelligent parking and self - driving vehicles, by integrating advanced communication and information technologies, thus coping with these challenges. However, the large amount of complex data generated by ITS poses significant challenges to traditional data processing systems, especially in terms of data analysis and processing. To solve these problems, this paper proposes an Edge - based Data Lake Architecture, aiming to effectively integrate and analyze the complex data from ITS. This architecture provides scalability, fault tolerance and performance advantages, which are helpful for improving the decision - making process and enhancing innovative services in the intelligent transportation ecosystem. Specifically, the paper mainly solves the following problems: 1. **Processing of large - scale and heterogeneous data**: The amount of data generated by ITS is huge and comes from diverse sources, including vehicles, road infrastructures and citizens. Traditional data processing systems are difficult to effectively process these data. 2. **High mobility and frequent disconnections**: Vehicles usually travel at high speeds, resulting in frequent disconnections, especially in the case of limited communication range. 3. **High communication and computing overhead of cloud architectures**: Centralized cloud architectures may lead to high latency and waste of computing resources when processing large amounts of data. To solve these problems, the paper proposes a data lake architecture based on edge computing, using Multi - access Edge Computing (MEC) infrastructure to provide processing and storage resources at the network edge. This architecture realizes the efficient integration, cleaning and inference of data through distributed servers, thus supporting decision - making applications in ITS. In addition, the paper also verifies the effectiveness of this architecture through three different use cases (vehicle sensor network, mobile network and driver identification application). ### Use - case analysis 1. **Vehicle Sensor Network (VSN) application**: - **Background**: VSN is a remote sensing paradigm. Vehicles are equipped with various sensing devices, powerful processing units and wireless communication capabilities, and can be used as mobile sensors to monitor the urban environment. - **Problem**: It is necessary to regularly upload sensor data to the monitoring center, but directly transmitting a large amount of data through the cellular network will occupy network resources. - **Solution**: Adopt data offloading technology, select a small number of vehicles as aggregation points, and collect and upload data through device - to - device (D2D) communication. - **Effect**: The experimental results show that the proposed scheme increases the data aggregation rate by 10.45% during peak hours, effectively reducing the upload cost and bandwidth consumption. 2. **Mobile network application**: - **Background**: Mobile devices transmit data through the cellular network to optimize mobile services. - **Problem**: It requires low - latency processing and large - scale storage analysis. - **Solution**: The edge data lake is responsible for real - time processing and small - batch processing, and the cloud data lake is responsible for large - scale storage and analysis. - **Effect**: It realizes low - latency applications and improves the quality of mobile services. 3. **Driver identification application**: - **Background**: Identify specific drivers through the data of different car sensors. - **Problem**: It requires rapid inference and model training. - **Solution**: The edge data lake performs pre - processing and low - latency inference, and the cloud data lake performs model training and returns the trained model to the edge. - **Effect**: It improves the accuracy and response speed of driver identification. In conclusion, this paper solves several key problems in big data processing and analysis in intelligent transportation systems by proposing a data lake architecture based on edge computing, showing its potential and effectiveness in practical applications.