Building Usage Prediction in Complex Urban Scenes By Fusing Text and Facade Features from Street View Images Using Deep Learning

Surya Prasath Ramalingam,Vaibhav Kumar
DOI: https://doi.org/10.1016/j.buildenv.2024.112174
IF: 7.093
2024-10-13
Building and Environment
Abstract:Building usage maps are inputs to many urban planning applications, however, the existing methods and the available data have limitations in generating instance level high resolution usage maps. In this study we tackle this problem by utilizing Street View Images (SVIs) and proposing a novel ensemble learning architecture that leverages building façade features and text extracted from hoardings, posters, etc. on buildings to predict the usage class. A pre-trained object detection model i.e., DINO, is implemented to efficiently identify buildings. A novel manually labeld training data of detected buildings corresponding to their usage is used to extract features from building facades across diverse Indian cities (Hyderabad, Mumbai, Bangalore, Delhi) using Vision Transformer (ViT) model. Following this, CLIPSeg a pre-trained segmentation model recognizes text specifically on building elements like signs, posters, and banners. We then leverage GPT-3.5 Turbo, a Large Language Model (LLM), fine-tuned with a specifically designed prompting method, to infer building usage from the recognized text. To achieve optimal performance, the proposed ensemble linear metaclassifier combines predictions from ViT and LLM model. The predicted building usages are attributed to their corresponding locations to develop spatial maps. An analysis of our framework compared against ground truth data collected from various Indian cities reveals significantly accurate outcomes. Our findings highlight the utility of textual information in classifying utilities and commercial buildings, while features extracted from vision models prove more informative for residential buildings. Our approach can automate the generation of roadside building attributes and usage details on a larger scale.
engineering, environmental,construction & building technology, civil
What problem does this paper attempt to address?