Comprehensive urban space representation with varying numbers of street-level images

Yingjing Huang,Fan Zhang,Yong Gao,Wei Tu,Fabio Duarte,Carlo Ratti,Diansheng Guo,Yu Liu
DOI: https://doi.org/10.1016/j.compenvurbsys.2023.102043
IF: 6.454
2023-10-12
Computers Environment and Urban Systems
Abstract:Street-level imagery has emerged as a valuable tool for observing large-scale urban spaces with unprecedented detail. However, previous studies have been limited to analyzing individual street-level images. This approach falls short in representing the characteristics of a spatial unit, such as a street or grid, which may contain varying numbers of street-level images ranging from several to hundreds. As a result, a more comprehensive and representative approach is required to capture the complexity and diversity of urban environments at different spatial scales. To address this issue, this study proposes a deep learning-based module called Vision-LSTM, which can effectively obtain vector representation from varying numbers of street-level images in spatial units. The effectiveness of the module is validated through experiments to recognize urban villages, achieving reliable recognition results (overall accuracy: 91.6%) through multimodal learning that combines street-level imagery with remote sensing imagery and social sensing data. Compared to existing image fusion methods, Vision-LSTM demonstrates significant effectiveness in capturing associations between street-level images. The proposed module can provide a more comprehensive understanding of urban spaces, enhancing the research value of street-level imagery and facilitating multimodal learning-based urban research. Our models are available at https://github.com/yingjinghuang/Vision-LSTM .
environmental studies,geography,regional & urban planning
What problem does this paper attempt to address?