Data streaming platform for crowd-sourced vehicle dataset generation

Felipe Mogollon,Zaloa Fernandez,Angel Martin,Juan Diego Ortega,Gorka Velez
DOI: https://doi.org/10.1109/TIV.2024.3486926
2024-10-29
Abstract:Vehicles are sophisticated machines equipped with sensors that provide real-time data for onboard driving assistance systems. Due to the wide variety of traffic, road, and weather conditions, continuous system enhancements are essential. Connectivity allows vehicles to transmit previously unknown data, expanding datasets and accelerating the development of new data models. This enables faster identification and integration of novel data, improving system reliability and reducing time to market. Data Spaces aim to create a data-driven, interconnected, and innovative data economy, where edge and cloud infrastructures support a virtualised IoT platform that connects data sources and development servers. This paper proposes an edge-cloud data platform to connect car data producers with multiple and heterogeneous services, addressing key challenges in Data Spaces, such as data sovereignty, governance, interoperability, and privacy. The paper also evaluates the data platform's performance limits for text, image, and video data workloads, examines the impact of connectivity technologies, and assesses latencies. The results show that latencies drop to 33ms with 5G connectivity when pipelining data to consuming applications hosted at the edge, compared to around 77ms when crossing both edge and cloud processing infrastructures. The results offer guidance on the necessary processing assets to avoid bottlenecks in car data platforms.
Networking and Internet Architecture
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: how to build an efficient, secure and scalable data - flow platform to realize the real - time collection, processing and sharing of vehicle data, thereby accelerating the development of Advanced Driver - Assistance Systems (ADAS) and Autonomous Driving (AD) technologies. Specifically, the paper focuses on the following aspects: 1. **Data Sovereignty, Governance, Interoperability and Privacy**: - In the process of intelligent vehicle data generation and consumption, how to ensure data sovereignty (i.e., the data owner's control over their data) and meet the requirements of governance, interoperability and privacy. - The data platform needs to be able to securely share data among different stakeholders while protecting user privacy. 2. **Low Latency in Data Transmission and Processing**: - How to reduce the latency in data transmission and processing through the combination of edge computing and cloud computing, especially in the 5G network environment. - Research shows that when using 5G connections, the latency of transmitting data to applications through edge computing can be reduced to 33 milliseconds, and 77 milliseconds when processed through edge and cloud infrastructures. 3. **Management of Data Complexity and Workloads**: - How to deal with different types of workloads (text, image, video, etc.), and ensure that the data platform can still maintain high performance in the case of high concurrency and complex data. - Evaluate the impact of different connectivity technologies and processing methods on performance and provide optimization suggestions. 4. **Data Quality and Trust Mechanisms**: - How to evaluate and ensure data quality, and ensure the authenticity and reliability of data. - Use the European Telecommunications Standards Institute (ETSI) standard TS 103 759 V2.1.1 (2023 - 01) to evaluate data quality, and detect contradictions through redundant information to enhance data trustworthiness. 5. **Economics and Business Models**: - Explore how to monetize data through the data platform, for example, pricing through licenses, data consumption and pre - processing computing resource allocation. - Provide a pay - as - you - go mechanism so that consumers can flexibly use data services according to actual needs. In summary, this paper aims to propose a data - flow platform based on the edge - cloud architecture to solve the key challenges in intelligent vehicle data generation and sharing, especially data sovereignty, low - latency processing, complex workload management, data quality and trust mechanisms, and economically viable business models.