MIPD: A Multi-sensory Interactive Perception Dataset for Embodied Intelligent Driving

Zhiwei Li,Tingzhen Zhang,Meihua Zhou,Dandan Tang,Pengwei Zhang,Wenzhuo Liu,Qiaoning Yang,Tianyu Shen,Kunfeng Wang,Huaping Liu
2024-11-08
Abstract:During the process of driving, humans usually rely on multiple senses to gather information and make decisions. Analogously, in order to achieve embodied intelligence in autonomous driving, it is essential to integrate multidimensional sensory information in order to facilitate interaction with the environment. However, the current multi-modal fusion sensing schemes often neglect these additional sensory inputs, hindering the realization of fully autonomous driving. This paper considers multi-sensory information and proposes a multi-modal interactive perception dataset named MIPD, enabling expanding the current autonomous driving algorithm framework, for supporting the research on embodied intelligent driving. In addition to the conventional camera, lidar, and 4D radar data, our dataset incorporates multiple sensor inputs including sound, light intensity, vibration intensity and vehicle speed to enrich the dataset comprehensiveness. Comprising 126 consecutive sequences, many exceeding twenty seconds, MIPD features over 8,500 meticulously synchronized and annotated frames. Moreover, it encompasses many challenging scenarios, covering various road and lighting conditions. The dataset has undergone thorough experimental validation, producing valuable insights for the exploration of next-generation autonomous driving frameworks.
Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the deficiencies of current multi - modal fusion perception schemes in achieving fully autonomous driving. Specifically, existing multi - modal datasets often overlook additional sensory inputs, such as sound, light intensity, vibration intensity, and vehicle speed, etc., which limits the comprehensive understanding and adaptability of autonomous driving systems to complex environments. Therefore, the paper proposes a new multi - modal interactive perception dataset (ParallelBody), aiming to enhance the perception ability of autonomous driving systems by integrating multiple sensor data. ### Main problems in the paper: 1. **Limitations of multi - modal datasets**: Existing datasets (such as Kitti, NuScenes, Waymo, Argoverse, etc.) provide rich visual data, but are deficient in multi - dimensional perception, especially when dealing with complex environmental changes (such as lighting conditions, road surface conditions, etc.). 2. **Incompleteness of environmental perception**: Existing autonomous driving systems mainly rely on traditional sensors such as cameras, lidars, and radars when perceiving the environment, lacking comprehensive consideration of multi - sensory information such as sound, light intensity, and vibration. 3. **Adaptability to dynamic environments**: Autonomous driving systems need to make rapid and accurate decisions in dynamically changing environments, and existing datasets and algorithms perform poorly in this regard. ### Solutions: 1. **Construct a multi - modal interactive perception dataset**: The paper proposes a new multi - modal dataset, ParallelBody. This dataset not only contains traditional camera, lidar, and 4D radar data, but also integrates multiple sensor data such as sound, light intensity, vibration intensity, and vehicle speed. 2. **Enrich the content of the dataset**: The dataset contains 126 consecutive sequences, with most sequences exceeding 20 seconds, totaling more than 8,500 carefully synchronized and annotated frames. The dataset covers challenging scenarios under various road and lighting conditions. 3. **Experimental verification**: The effectiveness of the collected dataset is verified through experiments using multiple single - modal and multi - modal related models. ### Main contributions: 1. **Multi - modal dataset**: A brand - new multi - modal dataset is proposed, integrating multiple sensor data, including cameras, point clouds, 4D radars, sounds, vibrations, light intensities, and vehicle speeds, etc., to enhance perception tasks. 2. **Dataset content**: The dataset contains 126 consecutive sequences, each sequence exceeding 20 seconds, with a total of more than 8,500 synchronized and annotated frames, covering challenging scenarios under various road, weather, and lighting conditions. 3. **Experimental verification**: Through experiments with multiple single - modal and multi - modal related models, the effectiveness of the dataset is verified. Through these contributions, the paper aims to promote the development of autonomous driving technology, especially in the research of multi - modal interactive perception and environmental adaptability.