Abstract:We introduce a new RGB-D object dataset captured in the wild called WildRGB-D. Unlike most existing real-world object-centric datasets which only come with RGB capturing, the direct capture of the depth channel allows better 3D annotations and broader downstream applications. WildRGB-D comprises large-scale category-level RGB-D object videos, which are taken using an iPhone to go around the objects in 360 degrees. It contains around 8500 recorded objects and nearly 20000 RGB-D videos across 46 common object categories. These videos are taken with diverse cluttered backgrounds with three setups to cover as many real-world scenarios as possible: (i) a single object in one video; (ii) multiple objects in one video; and (iii) an object with a static hand in one video. The dataset is annotated with object masks, real-world scale camera poses, and reconstructed aggregated point clouds from RGBD videos. We benchmark four tasks with WildRGB-D including novel view synthesis, camera pose estimation, object 6d pose estimation, and object surface reconstruction. Our experiments show that the large-scale capture of RGB-D objects provides a large potential to advance 3D object learning. Our project page is <a class="link-external link-https" href="https://wildrgbd.github.io/" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to construct a large-scale real-world RGB-D (Red-Green-Blue-Depth) object dataset, called WildRGB-D. Unlike most existing real-world object datasets that only contain RGB images, directly capturing the depth channel can provide better 3D annotations and a wider range of applications. Specifically, the paper attempts to address the following issues: 1. **Lack of Large-Scale Real-World RGB-D Data**: - Existing 3D object datasets are mostly synthetic data or partially real scanned data, lacking large-scale real-world multi-view RGB-D videos. - This leads to limited model performance in real-world applications, as synthetic data is difficult to simulate real textures, shapes, backgrounds, and natural lighting. 2. **3D Object Learning in Multi-View and Complex Scenes**: - Existing datasets usually cover limited angles and scenes, failing to fully reflect the diversity and complexity of the real world. - The WildRGB-D dataset records 360-degree videos, covering various scenes such as single objects, multiple objects, and handheld objects, increasing the diversity and complexity of the data. 3. **Performance Improvement in Downstream Tasks**: - This dataset is used to evaluate four downstream tasks: novel view synthesis, camera pose estimation, object 6D pose estimation, and object surface reconstruction. - Experimental results show that large-scale RGB-D data capture provides great potential for 3D object learning, especially in tasks such as novel view synthesis and camera pose estimation. 4. **Application of Self-Supervised Learning**: - The dataset also explores the application of self-supervised learning in object 6D pose estimation, demonstrating that effective self-supervised training can be achieved with large-scale RGB-D images even without training labels. In summary, by constructing the large-scale WildRGB-D dataset, this paper aims to address the shortcomings of existing datasets in real-world 3D object learning, promoting research and applications in related fields.

RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos

Salient Object Detection in RGB-D Videos

WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language

Digital Twin Tracking Dataset (DTTD): A New RGB+Depth 3D Dataset for Longer-Range Object Tracking Applications

Weakly-Supervised RGBD Video Object Segmentation

DepthTrack : Unveiling the Power of RGBD Tracking

Matterport3D: Learning from RGB-D Data in Indoor Environments

RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery

3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views

Recognizing Objects In-the-wild: Where Do We Stand?

ClearPose: Large-scale Transparent Object Dataset and Benchmark

360 in the Wild: Dataset for Depth Prediction and View Synthesis

RGBD Object Tracking: An In-depth Review

ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data

Caltech Aerial RGB-Thermal Dataset in the Wild

A Large Scale RGB-D Dataset for Action Recognition.

Towards Robust Robot 3D Perception in Urban Environments: The UT Campus Object Dataset

SupeRGB-D: Zero-shot Instance Segmentation in Cluttered Indoor Environments

ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection

Single-Image Depth Perception in the Wild

Towards Long-term Robotics in the Wild