Abstract:A robust visual understanding of complex urban environments using passive optical sensors is an onerous and essential task for autonomous navigation. The problem is heavily characterized by the quality of the available dataset and the number of instances it includes. Regardless of the benchmark results of perception algorithms, a model would only be reliable and capable of enhanced decision making if the dataset covers the exact domain of the end-use case. For this purpose, in order to improve the level of instances in datasets used for the training and validation of Autonomous Vehicles (AV), Advanced Driver Assistance Systems (ADAS), and autonomous driving, and to reduce the void due to the no-existence of any datasets in the context of railway smart mobility, we introduce our multimodal hybrid dataset called ESRORAD. ESRORAD is comprised of 34 videos, 2.7 k virtual images, and 100 k real images for both road and railway scenes collected in two Normandy towns, Rouen and Le Havre. All the images are annotated with 3D bounding boxes showing at least three different classes of persons, cars, and bicycles. Crucially, our dataset is the first of its kind with uncompromised efforts on being the best in terms of large volume, abundance in annotation, and diversity in scenes. Our escorting study provides an in-depth analysis of the dataset's characteristics as well as a performance evaluation with various state-of-the-art models trained under other popular datasets, namely, KITTI and NUScenes. Some examples of image annotations and the prediction results of our 3D object detection lightweight algorithms are available in ESRORAD dataset. Finally, the dataset is available online. This repository consists of 52 datasets with their respective annotations performed.

nuScenes: A Multimodal Dataset for Autonomous Driving

nuScenes: A multimodal dataset for autonomous driving

MAN TruckScenes: A multimodal dataset for autonomous trucking in diverse conditions

doScenes: An Autonomous Driving Dataset with Natural Language Instruction for Human Interaction and Vision-Language Navigation

NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario

UniScene: Multi-Camera Unified Pre-training via 3D Scene Reconstruction for Autonomous Driving

RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception

Vision meets robotics: The KITTI dataset

KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D

Cityscapes 3D: Dataset and Benchmark for 9 DoF Vehicle Detection

The ADUULM-360 Dataset -- A Multi-Modal Dataset for Depth Estimation in Adverse Weather

RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments

Road and Railway Smart Mobility: A High-Definition Ground Truth Hybrid Dataset

aiMotive Dataset: A Multimodal Dataset for Robust Autonomous Driving with Long-Range Perception

SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving

Dataset and Benchmark: Novel Sensors for Autonomous Vehicle Perception

Zenseact Open Dataset: A large-scale and diverse multimodal dataset for autonomous driving

RACECAR -- The Dataset for High-Speed Autonomous Racing

SceNDD: A Scenario-based Naturalistic Driving Dataset

AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving

LiDAR-CS Dataset: LiDAR Point Cloud Dataset with Cross-Sensors for 3D Object Detection