A Deep Multi-Modal Fusion Approach for Semantic Place Prediction in Social Media

Kaidi Meng,Haojie Li,Zhihui Wang,Xin Fan,Fuming Sun,Zhongxuan Luo
DOI: https://doi.org/10.1145/3132515.3132519
2017-01-01
Abstract:Semantic places such as "home," "work," and "school" are much easier to understand compared to GPS coordinates or street addresses and contribute to the automatic inference of related activities, which could further help in the study of personal lifestyle patterns and the provision of more customized services for human beings. In this work, we present a feature-level fusion method for semantic place prediction that utilizes user-generated text-image pairs from online social media as input. To take full advantage of each specific modality, we concatenate features from two state-of-the-art Convolutional Neural Networks (CNNs) and train them together. To the best of our knowledge, the present study is the first attempt to conduct semantic place prediction based only on microblogging multimedia content. The experimental results demonstrate that our deep multi-modal architecture outperforms single-modal methods and the traditional fusion method.
What problem does this paper attempt to address?