Abstract:Building façade elements are an important foundation for smart cities. As buildings exhibit an array of textures and geometric forms, the process of image acquisition is easily affected, although the robustness of texture in scenes (e.g., dilapidated buildings) is poor, with high point cloud data, and low recognition efficiency; therefore, the accuracy of building element extraction based on a single data source remains limited. In this research, a method for building façade element extraction based on multidimensional virtual semantic feature map ensemble learning and hierarchical clustering is proposed. Point clouds were obtained by multi-view images, and then the multidimensional virtual semantic feature maps, including color, texture, orientation, and curvature semantics, were acquired via reprojection. The multi-semantic feature block pre-segmentation, considering multiple features, was obtained by ensemble learning, and a hierarchical clustering strategy was established for to achieve fine extraction of building façade elements. Experiments were conducted across multiple building types, and the results showed that: 1) The method can use different virtual semantic feature map and clustering strategies to achieve accurate extraction of diverse building façade elements; 2) The method achieved joint learning tasks in both 2D and 3D space; and, 3) The proposed method achieved fine extraction of building elements with pixel accuracy (PA) over 70% in all experiments and mean intersection over union (mIoU) up to 95%, which were better than the image based method. In summary, this method offers a novel, more reliable method for segmenting and extracting building façade elements, which has important theoretical and practical significance.

Multiview Feature Aggregation for Facade Parsing

Progressive Feature Learning for Facade Parsing with Occlusions

DeepFacade: A Deep Learning Approach to Facade Parsing with Symmetric Loss.

Building façade element extraction based on multidimensional virtual semantic feature map ensemble learning and hierarchical clustering

Voxelized 3D Feature Aggregation for Multiview Detection

Multi-layer Feature Aggregation for Deep Scene Parsing Models

Semantic Annotation for Complex Video Street Views Based on 2D–3D Multi-Feature Fusion and Aggregated Boosting Decision Forests

Building Facade Parsing R-CNN

FA-MSVNet: multi-scale and multi-view feature aggregation methods for stereo 3D reconstruction

Building Facade Recognition Using Oblique Aerial Images

Translational Symmetry-Aware Facade Parsing for 3D Building Reconstruction

Towards Deeper and Better Multi-view Feature Fusion for 3D Semantic Segmentation

Improving facade parsing with vision transformers and line integration

Multiview Detection with Feature Perspective Transformation

Pyramid ALKNet for Semantic Parsing of Building Facade Image

Multi-view depth estimation based on multi-feature aggregation for 3D reconstruction

Data-Driven Facade Reconstruction

End-to-end multiview fusion for building mapping from aerial images

MSFA-Net: A Multiscale Feature Aggregation Network for Semantic Segmentation of Historical Building Point Clouds

Cross-Level Attentive Feature Aggregation for Change Detection

Automatic, Multiview, Coplanar Extraction for CityGML Building Model Texture Mapping