Multi-modal Transformer Using Two-Level Visual Features for Fake News Detection.

Wang Bin,Feng Yong,Xiong Xian-cai,Wang Yong-heng,Qiang Bao-hua
DOI: https://doi.org/10.1007/s10489-022-04055-5
IF: 5.3
2022-01-01
Applied Intelligence
Abstract:Fake news with multimedia data is ubiquitous on the Internet nowadays, and it is difficult for users to distinguish them. Therefore, it is necessary to design automatic multi-modal fake news detectors. However, the existing works make poor utilization of visual information, and do not fully consider the semantic interaction of multi-modal data. In this paper, we propose the multi-modal transformer using two-level visual features (MTTV) for fake news detection. First, we model texts and images from news uniformly as sequences that can be processed by transformer, and two-level visual features, i.e. global feature and entity-level feature, are used to improve the utilization of news images. Second, we extend the transformer model for natural language processing to multi-modal transformer which can make multi-modal data interact fully and capture the semantic relationships between them. In addition, we propose a scalable classifier to improve the classification balance of fine-grained fake news detection with the problem of class imbalance. Extensive experiments on two public datasets demonstrate that our method achieved significant performance improvement compared to the state-of-the-art methods. The source code is available at https://github.com/cqu-wb/MTTV .
What problem does this paper attempt to address?