WAFFLE: Multimodal Floorplan Understanding in the Wild

Keren Ganon,Morris Alper,Rachel Mikulinsky,Hadar Averbuch-Elor
2024-12-02
Abstract:Buildings are a central feature of human culture and are increasingly being analyzed with computational methods. However, recent works on computational building understanding have largely focused on natural imagery of buildings, neglecting the fundamental element defining a building's structure -- its floorplan. Conversely, existing works on floorplan understanding are extremely limited in scope, often focusing on floorplans of a single semantic category and region (e.g. floorplans of apartments from a single country). In this work, we introduce WAFFLE, a novel multimodal floorplan understanding dataset of nearly 20K floorplan images and metadata curated from Internet data spanning diverse building types, locations, and data formats. By using a large language model and multimodal foundation models, we curate and extract semantic information from these images and their accompanying noisy metadata. We show that WAFFLE enables progress on new building understanding tasks, both discriminative and generative, which were not feasible using prior datasets. We will publicly release WAFFLE along with our code and trained models, providing the research community with a new foundation for learning the semantics of buildings.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that the existing work on automatic floorplan understanding is extremely limited in scope, usually focusing only on a single semantic category and region (for example, apartments in a certain country). This is in sharp contrast to the diversity of building shapes and sizes in the real world, which reflects the different uses of buildings. To address this issue, the authors introduce WAFFLE, a new multimodal dataset containing nearly 20,000 floorplan images and their metadata. These data are collected from the Internet and cover a variety of building types, geographical locations, and data formats. Specifically, the paper mainly addresses the following issues: 1. **Lack of diversity and wide applicability**: Most of the existing floorplan understanding methods are limited to specific types of buildings (such as residences) and specific geographical areas and cannot cover the diversity of buildings in reality. The WAFFLE dataset provides a broader sample by including building floorplans from different countries, different periods, and different uses. 2. **Automated processing and annotation**: Traditional methods rely on manual annotation, which is time - consuming and costly. WAFFLE uses large - language models (LLMs) and vision - language models (VLMs) to process and annotate images and metadata in an automated manner, reducing the need for human intervention. 3. **New tasks and challenges**: Existing datasets cannot support some new building - understanding tasks. WAFFLE not only serves as a challenging benchmark set but also makes new floorplan - understanding tasks possible, such as predicting building types and generating floorplans that conform to specific structural configurations. 4. **Open - vocabulary segmentation**: To better understand the local semantic information in floorplans, the authors fine - tuned a text - driven segmentation model using Grounded Architectural Features (GAFs) in WAFFLE, thereby achieving open - vocabulary floorplan segmentation and improving the model's generalization ability on diverse data. In summary, the introduction of the WAFFLE dataset aims to address the deficiencies of existing floorplan - understanding methods in terms of diversity and automated processing and provides a more challenging and widely applicable foundation for future research.