TrafficScene: A Multi-modal Dataset Including Light Field for Semantic Segmentation of Traffic Scenes

Jie Luo,Xin Jin,Mingyu Liu,Yihui Fan
DOI: https://doi.org/10.1109/icme57554.2024.10687943
2024-01-01
Abstract:High-quality annotated data is crucial in semantic segmentation. However, existing datasets either provide single view images or offer small baseline multi-angle light field images with annotations only for the central image, hindering the development of multi-angle perception capabilities. In this paper, we introduce the first large-baseline light field multimodal semantic segmentation dataset collected using a 3×3 camera array and a lidar. To the best of our knowledge, it is the largest light field multimodal dataset with 623 fully annotated light field images of each view and 623 frame point clouds. Then, we propose a baseline method PSPNet_LGA for all light field image segmentation, which synergizes local information from each angle with the global features of the light field, facilitating segmentation of each perspective. Our experimental results reveal that our method achieves an average improvement of 1.13 mIoU compared to the single-image semantic segmentation baseline. Additionally, we offer a real-scene dataset with highly accurate ground truth for light field depth estimation, establishing a benchmark in this area. Our dataset and code will be made available at https://github.com/rogercomeon/TrafficScene.
What problem does this paper attempt to address?