Abstract:Humans possess the cognitive ability to comprehend scenes in a compositional manner. To empower AI systems with similar capabilities, object-centric learning aims to acquire representations of individual objects from visual scenes without any supervision. Although recent advances in object-centric learning have made remarkable progress on complex synthesis datasets, there is a huge challenge for application to complex real-world scenes. One of the essential reasons is the scarcity of real-world datasets specifically tailored to object-centric learning. To address this problem, we propose a versatile real-world dataset of tabletop scenes for object-centric learning called OCTScenes, which is meticulously designed to serve as a benchmark for comparing, evaluating, and analyzing object-centric learning methods. OCTScenes contains 5000 tabletop scenes with a total of 15 objects. Each scene is captured in 60 frames covering a 360-degree perspective. Consequently, OCTScenes is a versatile benchmark dataset that can simultaneously satisfy the evaluation of object-centric learning methods based on single-image, video, and multi-view. Extensive experiments of representative object-centric learning methods are conducted on OCTScenes. The results demonstrate the shortcomings of state-of-the-art methods for learning meaningful representations from real-world data, despite their impressive performance on complex synthesis datasets. Furthermore, OCTScenes can serve as a catalyst for the advancement of existing methods, inspiring them to adapt to real-world scenes. Dataset and code are available at <a class="link-external link-https" href="https://huggingface.co/datasets/Yinxuan/OCTScenes" rel="external noopener nofollow">this https URL</a>.

Furnishing Your Room by What You See: An End-to-End Furniture Set Retrieval Framework with Rich Annotated Benchmark Dataset

FurniScene: A Large-scale 3D Room Dataset with Intricate Furnishing Scenes

Web3D-based Automatic Furniture Layout System Using Recursive Case-Based Reasoning and Floor Field.

CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding

3D-FRONT: 3D Furnished Rooms with layOuts and semaNTics

Deep Layout of Custom-size Furniture through Multiple-domain Learning

RoomDesigner: Encoding Anchor-latents for Style-consistent and Shape-compatible Indoor Scene Generation

Fuzzy-based indoor scene modeling with differentiated examples

3D-FUTURE: 3D Furniture Shape with TextURE

InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset

End-to-end Generative Floor-plan and Layout with Attributes and Relation Graph

Deep Reinforcement Learning for Producing Furniture Layout in Indoor Scenes

DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization

A Data-driven Approach for Furniture and Indoor Scene Colorization

OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for Object-Centric Learning

TO-Scene: A Large-scale Dataset for Understanding 3D Tabletop Scenes

What can i do around here? Deep functional scene understanding for cognitive robots

Active Arrangement of Small Objects in 3D Indoor Scenes

Semantic-aware Room-Level Indoor Modeling from Point Clouds

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single Images