Abstract:Accurate and efficient classification maps of urban functional zones (UFZs) are crucial to urban planning, management, and decision making. Due to the complex socioeconomic UFZ properties, it is increasingly challenging to identify urban functional zones by using remote-sensing images (RSIs) alone. Point-of-interest (POI) data and remote-sensing image data play important roles in UFZ extraction. However, many existing methods only use a single type of data or simply combine the two, failing to take full advantage of the complementary advantages between them. Therefore, we designed a deep-learning framework that integrates the above two types of data to identify urban functional areas. In the first part of the complementary feature-learning and fusion module, we use a convolutional neural network (CNN) to extract visual features and social features. Specifically, we extract visual features from RSI data, while POI data are converted into a distance heatmap tensor that is input into the CNN with gated attention mechanisms to extract social features. Then, we use a feature fusion module (FFM) with adaptive weights to fuse the two types of features. The second part is the spatial-relationship-modeling module. We designed a new spatial-relationship-learning network based on a vision transformer model with long- and short-distance attention, which can simultaneously learn the global and local spatial relationships of the urban functional zones. Finally, a feature aggregation module (FGM) utilizes the two spatial relationships efficiently. The experimental results show that the proposed model can fully extract visual features, social features, and spatial relationship features from RSIs and POIs for more accurate UFZ recognition.

Remote Sensing and Time Series Data Fused Multimodal Prediction Model Based on Interaction Analysis

Modality-invariant Temporal Representation Learning for Multimodal Sentiment Classification

A multimodal fusion framework for urban scene understanding and functional identification using geospatial data

A Cyclic Information–Interaction Model for Remote Sensing Image Segmentation

Deep Multimodal Data Fusion

Cross-Modal Sentiment Sensing with Visual-Augmented Representation and Diverse Decision Fusion

Consumer Intention Recognition and Behavior Prediction of Social E-commerce Users Based on Multimodal Fusion

Multimodal Language Analysis with Recurrent Multistage Fusion

Towards Effective Fusion and Forecasting of Multimodal Spatio-temporal Data for Smart Mobility

Dual-View Multimodal Interaction in Multimodal Sentiment Analysis

A cross modal hierarchical fusion multimodal sentiment analysis method based on multi-task learning

A Deep-Learning-Based Multimodal Data Fusion Framework for Urban Region Function Recognition

Multimodal Sentiment Analysis Using Multi-tensor Fusion Network with Cross-modal Modeling

Multimodal Remote Sensing Data Classification Based on Gaussian Mixture Variational Dynamic Fusion Network

Similarity Guided Multimodal Fusion Transformer for Semantic Location Prediction in Social Media

Multi-modal Deep Analysis for Multimedia

Feature Extraction Network with Attention Mechanism for Data Enhancement and Recombination Fusion for Multimodal Sentiment Analysis

A Multimodal Data Fusion Model for Accurate and Interpretable Urban Land Use Mapping with Uncertainty Analysis

Multimodal Sentiment Analysis Based on Composite Hierarchical Fusion

Interpretation on Multi-modal Visual Fusion

Multi-Feature Fusion Multi-Modal Sentiment Analysis Model Based on Cross-Attention Mechanism