Abstract:Accurate and efficient classification maps of urban functional zones (UFZs) are crucial to urban planning, management, and decision making. Due to the complex socioeconomic UFZ properties, it is increasingly challenging to identify urban functional zones by using remote-sensing images (RSIs) alone. Point-of-interest (POI) data and remote-sensing image data play important roles in UFZ extraction. However, many existing methods only use a single type of data or simply combine the two, failing to take full advantage of the complementary advantages between them. Therefore, we designed a deep-learning framework that integrates the above two types of data to identify urban functional areas. In the first part of the complementary feature-learning and fusion module, we use a convolutional neural network (CNN) to extract visual features and social features. Specifically, we extract visual features from RSI data, while POI data are converted into a distance heatmap tensor that is input into the CNN with gated attention mechanisms to extract social features. Then, we use a feature fusion module (FFM) with adaptive weights to fuse the two types of features. The second part is the spatial-relationship-modeling module. We designed a new spatial-relationship-learning network based on a vision transformer model with long- and short-distance attention, which can simultaneously learn the global and local spatial relationships of the urban functional zones. Finally, a feature aggregation module (FGM) utilizes the two spatial relationships efficiently. The experimental results show that the proposed model can fully extract visual features, social features, and spatial relationship features from RSIs and POIs for more accurate UFZ recognition.

A Deep Multi-Modal Fusion Approach for Semantic Place Prediction in Social Media

Similarity Guided Multimodal Fusion Transformer for Semantic Location Prediction in Social Media

CMM: LiDAR-Visual Fusion with Cross-Modality Module for Large-Scale Place Recognition

Multi-context Embedding Based Personalized Place Semantics Recognition.

Location Prediction For Social Media Users Based On Information Fusion

Deep Fusion of Multiple Semantic Cues for Complex Event Recognition

Co-Occ: Coupling Explicit Feature Fusion with Volume Rendering Regularization for Multi-Modal 3D Semantic Occupancy Prediction

A Deep-Learning-Based Multimodal Data Fusion Framework for Urban Region Function Recognition

Multi-Modal Fake News Detection on Social Media with Dual Attention Fusion Networks

Transformer Based Multi-modal Fusion for Place Recognition with Self-attention Mechanism

An Attention-Based Multi-Representational Fusion Method for Social-Media-Based Text Classification

On the Consensus of Synchronous Temporal and Spatial Views: A Novel Multimodal Deep Learning Method for Social Video Prediction

MFF-PR: Point Cloud and Image Multi-modal Feature Fusion for Place Recognition.

Multi-Modal Fusion-Based Multi-Task Semantic Communication System

Place perception from the fusion of different image representation

Camera-LiDAR Fusion with Latent Contact for Place Recognition in Challenging Cross-Scenes

Towards Effective Next POI Prediction: Spatial and Semantic Augmentation with Remote Sensing Data

A multimodal fusion framework for urban scene understanding and functional identification using geospatial data

Multi-Modal Visual Place Recognition in Dynamics-Invariant Perception Space.

Two-Stage Spatial Mapping for Multimodal Data Fusion in Mobile Crowd Sensing