BF-SAM: Enhancing SAM Through Multi-Modal Fusion for Fine-Grained Building Function Identification

Zhaoya Gong,Binbo Li,Chenglong Wang,Jun Chen,Pengjun Zhao
DOI: https://doi.org/10.1080/13658816.2024.2399142
2024-01-01
International Journal of Geographical Information Science
Abstract:Building function identification (BFI) is crucial for urban planning and governance. The traditional remote sensing approach primarily focuses on extracting the physical features of buildings, overlooking their functional uses. Recently, progress has been made in urban functional area identification through multi-modal representation learning from multi-source spatial big data. However, the two approaches are disconnected, and each approach is inadequate to tackle the fine-grained BFI problem solely. To address this challenge, this study proposes a multi-modal foundation model for BFI, called BF-SAM, by fine-tuning a large visual model, Segment Anything Model (SAM), with multi-modal features related to urban functions. This model harnesses the segmentation capability of SAM for building delineation and fuses it with multi-modal representation learning for functional identification through a novel multi-modal fine-tuning method for SAM. Modality-dedicated feature extraction methods are devised to learn geographic features from road networks, population density, and points of interest. The validity of BF-SAM was evaluated on datasets from Munich, Beijing, Suzhou, and Hefei, and the importance of multi-modal geographic features was examined through extensive experiments. BF-SAM achieved a superior performance compared to a series of benchmarks. The potential of model transferability of BF-SAM was further explored under different spatial contexts.
What problem does this paper attempt to address?