TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer

Eunjee Choi,Jong-Kook Kim
2024-03-19
Abstract:Detecting fake news has received a lot of attention. Many previous methods concatenate independently encoded unimodal data, ignoring the benefits of integrated multimodal information. Also, the absence of specialized feature extraction for text and images further limits these methods. This paper introduces an end-to-end model called TT-BLIP that applies the bootstrapping language-image pretraining for unified vision-language understanding and generation (BLIP) for three types of information: BERT and BLIP\textsubscript{Txt} for text, ResNet and BLIP\textsubscript{Img} for images, and bidirectional BLIP encoders for multimodal information. The Multimodal Tri-Transformer fuses tri-modal features using three types of multi-head attention mechanisms, ensuring integrated modalities for enhanced representations and improved multimodal data analysis. The experiments are performed using two fake news datasets, Weibo and Gossipcop. The results indicate TT-BLIP outperforms the state-of-the-art models.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper focuses on the problem of fake news detection. With the development of social media, the spread of fake news has become a serious issue. Existing methods often overlook the integration of cross-modal information and dedicated feature extraction when dealing with multimodal information (text and images), which limits their effectiveness. The paper proposes a new model called TT-BLIP, which applies the pre-trained model BLIP for unified visual language understanding and generation to handle three types of information: BERT and BLIP Txt for text, ResNet and BLIP Img for images, and bidirectional BLIP encoder for multimodal information. Through the Multimodal Tri-Transformer, the model combines the features of text, images, and image-text, improving the richness of representation and the analysis capability of multimodal data. Experiments were conducted on two fake news datasets, Weibo and Gossipcop, and the results show that TT-BLIP outperforms the current state-of-the-art models in fake news detection. The main contributions of the paper include using the pre-trained BLIP model for feature extraction, proposing a new fusion mechanism - Multimodal Tri-Transformer, and demonstrating excellent performance on two multimodal fake news datasets. In summary, the paper aims to address the challenges of fake news detection and enhance the model's ability to comprehensively analyze text and image information through innovative fusion strategies and deep learning techniques.