Abstract:Buildings are a central feature of human culture and are increasingly being analyzed with computational methods. However, recent works on computational building understanding have largely focused on natural imagery of buildings, neglecting the fundamental element defining a building's structure -- its floorplan. Conversely, existing works on floorplan understanding are extremely limited in scope, often focusing on floorplans of a single semantic category and region (e.g. floorplans of apartments from a single country). In this work, we introduce WAFFLE, a novel multimodal floorplan understanding dataset of nearly 20K floorplan images and metadata curated from Internet data spanning diverse building types, locations, and data formats. By using a large language model and multimodal foundation models, we curate and extract semantic information from these images and their accompanying noisy metadata. We show that WAFFLE enables progress on new building understanding tasks, both discriminative and generative, which were not feasible using prior datasets. We will publicly release WAFFLE along with our code and trained models, providing the research community with a new foundation for learning the semantics of buildings.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that the existing work on automatic floorplan understanding is extremely limited in scope, usually focusing only on a single semantic category and region (for example, apartments in a certain country). This is in sharp contrast to the diversity of building shapes and sizes in the real world, which reflects the different uses of buildings. To address this issue, the authors introduce WAFFLE, a new multimodal dataset containing nearly 20,000 floorplan images and their metadata. These data are collected from the Internet and cover a variety of building types, geographical locations, and data formats. Specifically, the paper mainly addresses the following issues: 1. **Lack of diversity and wide applicability**: Most of the existing floorplan understanding methods are limited to specific types of buildings (such as residences) and specific geographical areas and cannot cover the diversity of buildings in reality. The WAFFLE dataset provides a broader sample by including building floorplans from different countries, different periods, and different uses. 2. **Automated processing and annotation**: Traditional methods rely on manual annotation, which is time - consuming and costly. WAFFLE uses large - language models (LLMs) and vision - language models (VLMs) to process and annotate images and metadata in an automated manner, reducing the need for human intervention. 3. **New tasks and challenges**: Existing datasets cannot support some new building - understanding tasks. WAFFLE not only serves as a challenging benchmark set but also makes new floorplan - understanding tasks possible, such as predicting building types and generating floorplans that conform to specific structural configurations. 4. **Open - vocabulary segmentation**: To better understand the local semantic information in floorplans, the authors fine - tuned a text - driven segmentation model using Grounded Architectural Features (GAFs) in WAFFLE, thereby achieving open - vocabulary floorplan segmentation and improving the model's generalization ability on diverse data. In summary, the introduction of the WAFFLE dataset aims to address the deficiencies of existing floorplan - understanding methods in terms of diversity and automated processing and provides a more challenging and widely applicable foundation for future research.

WAFFLE: Multimodal Floorplan Understanding in the Wild

MSD: A Benchmark Dataset for Floor Plan Generation of Building Complexes

Data-driven floor plan understanding in rural residential buildings via deep recognition

Floor plan generation: The interplay among data, machine, and designer

SenseWit: Pervasive Floorplan Generation Based on Only Inertial Sensing

Pervasive Floorplan Generation Based on Only Inertial Sensing: Feasibility, Design, and Implementation

Automatic Rendering of Building Floor Plan Images from Textual Descriptions in English

FloorNet: A Unified Framework for Floorplan Reconstruction from 3D Scans

Building Floorspace in China: A Dataset and Learning Pipeline

Unveiling Spaces: Architecturally meaningful semantic descriptions from images of interior spaces

Semantic-aware Room-Level Indoor Modeling from Point Clouds

Layer-Wise Floorplan Extraction for Automatic Urban Building Reconstruction

Walk2Map: Extracting Floor Plans from Indoor Walk Trajectories

Deep learning-based text detection and recognition on architectural floor plans

SUGAMAN: Describing Floor Plans for Visually Impaired by Annotation Learning and Proximity based Grammar

Emergency Floor Plan Digitization Using Machine Learning

Exploiting 2D Floorplan for Building-scale Panorama RGBD Alignment

FloorplanGAN: Vector Residential Floorplan Adversarial Generation

A Hybrid Semantic-Geometric Approach for Clutter-Resistant Floorplan Generation from Building Point Clouds

Multi-Unit Floor Plan Recognition and Reconstruction Using Improved Semantic Segmentation of Raster-Wise Floor Plans

ZAHA: Introducing the Level of Facade Generalization and the Large-Scale Point Cloud Facade Semantic Segmentation Benchmark Dataset