Remote Sensing and Time Series Data Fused Multimodal Prediction Model Based on Interaction Analysis

Zhiwei Zhang,Dong Wang
DOI: https://doi.org/10.1145/3376067.3376100
2019-01-01
Abstract:With the rapid development of the times, human's life is becoming more and more modern, helping people experience surrounding environment better. People can see attractive scenery, hear marvelous voice, smell fragrant flavor, touch soft objects and taste delicious food. All these feelings can be generalized by 'Modality'. As there are heterogeneous modalities, the way to learning from multiple such modalities become an emerging research topic. Multimodal machine learning has a wide range of applications while it still has many challenges. Challenges can be included in five categories: Representation, Translation, Alignment, Fusion, Co-learning. In this paper, we focus on the representation and fusion problem of multimodal and solve a practical problem of urban functional area classification. In this paper, we propose a scalable interaction model based on Squeeze-and-Excitation block to fuse image modality from remote sensing images and temporal modality from user visit sequence. Crucially, our model produces improvements over typical multimodal methods.
What problem does this paper attempt to address?