Abstract:Urban land use classification plays a significant role in urban studies and provides key guidance for urban development. However, existing methods predominantly rely on either raster structure deep features through convolutional neural networks (CNNs) or topological structure deep features through graph neural networks (GNNs), making it challenging to comprehensively capture the rich semantic information in remote sensing images. To address this limitation, we propose a novel urban land use classification model by integrating both raster and topological structure deep features to enhance the accuracy and robustness of the classification model. First, we divide the urban area into block units based on road network data and further subdivide these units using the fractal network evolution algorithm (FNEA). Next, the K-nearest neighbors (KNN) graph construction method with adaptive fusion coefficients is employed to generate both global and local graphs of the blocks and sub-units. The spectral features and subgraph features are then constructed, and a graph convolutional network (GCN) is utilized to extract the node relational features from both the global and local graphs, forming the topological structure deep features while aggregating local features into global ones. Subsequently, VGG-16 (Visual Geometry Group 16) is used to extract the image convolutional features of the block units, obtaining the raster structure deep features. Finally, the transformer is used to fuse both topological and raster structure deep features, and land use classification is completed using the softmax function. Experiments were conducted using high-resolution Google images and Open Street Map (OSM) data, with study areas on the third ring road of Shenyang and the fourth ring road of Chengdu. The results demonstrate that the proposed method improves the overall accuracy and Kappa coefficient by 9.32% and 0.17, respectively, compared to single deep learning models. Incorporating subgraph structure features further enhances the overall accuracy and Kappa by 1.13% and 0.1. The adaptive KNN graph construction method achieves accuracy comparable to that of the empirical threshold method. This study enables accurate large-scale urban land use classification with reduced manual intervention, improving urban planning efficiency. The experimental results verify the effectiveness of the proposed method, particularly in terms of classification accuracy and feature representation completeness.

Comprehensive urban space representation with varying numbers of street-level images

High Spatial-Resolution Classification of Urban Surfaces Using a Deep Learning Method

Multi-level Urban Street Representation with Street-View Imagery and Hybrid Semantic

Using Street-level Images and Deep Learning for Urban La ndscape STUDIES

Simultaneous Extraction of Spatial and Attributional Building Information Across Large-Scale Urban Landscapes from High-Resolution Satellite Imagery

A Deep-Learning-Based Multimodal Data Fusion Framework for Urban Region Function Recognition

StreetviewLLM: Extracting Geographic Information Using a Chain-of-Thought Multimodal Large Language Model

Multilevel Spatial-Channel Feature Fusion Network for Urban Village Classification by Fusing Satellite and Streetview Images

Mixed land use measurement and mapping with street view images and spatial context-aware prompts via zero-shot multimodal learning

A Spatial Analysis of Urban Streets under Deep Learning Based on Street View Imagery: Quantifying Perceptual and Elemental Perceptual Relationships

Integrating Aerial and Street View Images for Urban Land Use Classification

Understanding urban landuse from the above and ground perspectives: A deep learning, multimodal solution

Urban Land Use Classification Model Fusing Multimodal Deep Features

Predicting Multi-level Socioeconomic Indicators from Structural Urban Imagery

Learning visual features from figure-ground maps for urban morphology discovery

Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal Urban Neighborhood Embedding

Social sensing from street-level imagery: A case study in learning spatio-temporal urban mobility patterns

A multimodal fusion framework for urban scene understanding and functional identification using geospatial data

Long-Term Annual Mapping Of Four Cities On Different Continents By Applying A Deep Information Learning Method To Landsat Data

Model Fusion for Building Type Classification from Aerial and Street View Images

An Interpretable Machine Learning Framework for Measuring Urban Perceptions from Panoramic Street View Images