DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

Lu Ling,Yichen Sheng,Zhi Tu,Wentian Zhao,Cheng Xin,Kun Wan,Lantao Yu,Qianyu Guo,Zixun Yu,Yawen Lu,Xuanmao Li,Xingpeng Sun,Rohan Ashok,Aniruddha Mukherjee,Hao Kang,Xiangrui Kong,Gang Hua,Tianyi Zhang,Bedrich Benes,Aniket Bera
DOI: https://doi.org/10.1109/cvpr52733.2024.02092
2024-01-01
Computer Vision and Pattern Recognition
Abstract:We have witnessed significant progress in deep learning-based 3D vision,ranging from neural radiance field (NeRF) based 3D representation learning toapplications in novel view synthesis (NVS). However, existing scene-leveldatasets for deep learning-based 3D vision, limited to either syntheticenvironments or a narrow selection of real-world scenes, are quiteinsufficient. This insufficiency not only hinders a comprehensive benchmark ofexisting methods but also caps what could be explored in deep learning-based 3Danalysis. To address this critical gap, we present DL3DV-10K, a large-scalescene dataset, featuring 51.2 million frames from 10,510 videos captured from65 types of point-of-interest (POI) locations, covering both bounded andunbounded scenes, with different levels of reflection, transparency, andlighting. We conducted a comprehensive benchmark of recent NVS methods onDL3DV-10K, which revealed valuable insights for future research in NVS. Inaddition, we have obtained encouraging results in a pilot study to learngeneralizable NeRF from DL3DV-10K, which manifests the necessity of alarge-scale scene-level dataset to forge a path toward a foundation model forlearning 3D representation. Our DL3DV-10K dataset, benchmark results, andmodels will be publicly accessible at https://dl3dv-10k.github.io/DL3DV-10K/.
What problem does this paper attempt to address?