Multimodal Fake News Detection via CLIP-Guided Learning

Yangming Zhou,Yuzhou Yang,Qichao Ying,Zhenxing Qian,Xinpeng Zhang
DOI: https://doi.org/10.1109/ICME55011.2023.00480
2023-01-01
Abstract:Fake news detection (FND) has attracted much research interests in social forensics. Many existing approaches introduce tailored attention mechanisms to fuse unimodal features. However, they ignore the impact of cross-modal similarity between modalities. Meanwhile, the potential of pretrained multimodal feature learning models in FND has not been well exploited. This paper proposes an FND-CLIP framework, i.e., a multimodal Fake News Detection network based on Contrastive Language-Image Pretraining (CLIP). FND-CLIP extracts the deep representations together from news using two unimodal encoders and two pair-wise CLIP encoders. The CLIP-generated multimodal features are weighted by CLIP similarity of the two modalities. We also introduce a modality-wise attention module to aggregate the features. Extensive experiments are conducted and the results indicate that the proposed framework has a better capability in mining crucial features for fake news detection. The proposed FND-CLIP can achieve better performances than previous works on three typical fake news datasets.
What problem does this paper attempt to address?