Data-Centric Evolution in Autonomous Driving: A Comprehensive Survey of Big Data System, Data Mining, and Closed-Loop Technologies

Lincan Li,Wei Shao,Wei Dong,Yijun Tian,Qiming Zhang,Kaixiang Yang,Wenjie Zhang
2024-01-27
Abstract:The aspiration of the next generation's autonomous driving (AD) technology relies on the dedicated integration and interaction among intelligent perception, prediction, planning, and low-level control. There has been a huge bottleneck regarding the upper bound of autonomous driving algorithm performance, a consensus from academia and industry believes that the key to surmount the bottleneck lies in data-centric autonomous driving technology. Recent advancement in AD simulation, closed-loop model training, and AD big data engine have gained some valuable experience. However, there is a lack of systematic knowledge and deep understanding regarding how to build efficient data-centric AD technology for AD algorithm self-evolution and better AD big data accumulation. To fill in the identified research gaps, this article will closely focus on reviewing the state-of-the-art data-driven autonomous driving technologies, with an emphasis on the comprehensive taxonomy of autonomous driving datasets characterized by milestone generations, key features, data acquisition settings, etc. Furthermore, we provide a systematic review of the existing benchmark closed-loop AD big data pipelines from the industrial frontier, including the procedure of closed-loop frameworks, key technologies, and empirical studies. Finally, the future directions, potential applications, limitations and concerns are discussed to arouse efforts from both academia and industry for promoting the further development of autonomous driving. The project repository is available at:
Robotics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the data bottleneck issue in autonomous driving (AD) technology and promote the development of data-driven autonomous driving technology. Specifically, the paper focuses on the following aspects: 1. **Data Bottleneck Issue**: - The current performance ceiling of autonomous driving algorithms is limited mainly due to the insufficiency of datasets. Existing datasets struggle to cover all driving scenarios, especially the long-tail distribution problem, where rare edge cases are extremely underrepresented in the datasets. 2. **Development of Data-Driven Systems**: - There is a need to establish an efficient data-driven closed-loop system to facilitate the self-evolution of autonomous driving algorithms and the accumulation of big data. This includes multiple stages such as data collection, annotation, simulation, and closed-loop training. 3. **Data Mining and Generation Techniques**: - Advanced data mining techniques and sophisticated data generation methods are utilized to address the long-tail distribution problem, ensuring that autonomous driving systems can operate stably in various complex environments. 4. **Application of Closed-Loop Systems**: - A systematic review of existing industrial-grade closed-loop data-driven autonomous driving systems is conducted, including their workflows, key technologies, and practical application effects. Through closed-loop systems, autonomous driving systems can continuously learn and improve from real-world driving. 5. **Future Development Directions**: - The paper discusses the advantages and limitations of current methods and proposes future research directions to further advance autonomous driving technology. In summary, this paper aims to systematically review the latest data-driven autonomous driving technologies, fill existing research gaps, enhance industrial R&D efficiency, and provide valuable references for the academic community.