Logically at Factify 2022: Multimodal Fact Verification

Jie Gao,Hella-Franziska Hoffmann,Stylianos Oikonomou,David Kiskovski,Anil Bandhakavi

DOI: https://doi.org/10.48550/arXiv.2112.09253

2022-03-26

Abstract:This paper describes our participant system for the multi-modal fact verification (Factify) challenge at AAAI 2022. Despite the recent advance in text based verification techniques and large pre-trained multimodal models cross vision and language, very limited work has been done in applying multimodal techniques to automate fact checking process, particularly considering the increasing prevalence of claims and fake news about images and videos on social media. In our work, the challenge is treated as multimodal entailment task and framed as multi-class classification. Two baseline approaches are proposed and explored including an ensemble model (combining two uni-modal models) and a multi-modal attention network (modeling the interaction between image and text pair from claim and evidence document). We conduct several experiments investigating and benchmarking different SoTA pre-trained transformers and vision models in this work. Our best model is ranked first in leaderboard which obtains a weighted average F-measure of 0.77 on both validation and test set. Exploratory analysis of dataset is also carried out on the Factify data set and uncovers salient patterns and issues (e.g., word overlapping, visual entailment correlation, source bias) that motivates our hypothesis. Finally, we highlight challenges of the task and multimodal dataset for future research.

Computer Vision and Pattern Recognition,Computation and Language,Multimedia

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges in the field of multimodal fact verification, especially in the context of the increasing amount of false information about images and videos on social media. Specifically, the paper focuses on how to use multimodal techniques to automate the fact - checking process, which has been less explored in existing research. The author regards this challenge as a multimodal entailment task and frames it as a multi - class classification problem. Two baseline methods are proposed in the paper to solve this problem: one is an ensemble model that combines two unimodal models, and the other is a multimodal attention network that models the interaction between image - text pairs. The core of the paper lies in developing algorithms that can effectively capture the semantic consistency and integrity between images and text to address the following specific challenges: 1. **Fine - grained image differences**: Simple image similarity cannot distinguish subtle image differences and performs poorly for adversarial images. 2. **Cross - modal semantic integrity**: It is necessary to not only learn the content features of images and text respectively, but also capture the cross - modal semantic consistency. 3. **Problems in the dataset**: Such as vocabulary overlap, visual entailment correlation, source bias, etc., which are revealed in the exploratory data analysis. By proposing the above methods, the paper aims to improve the accuracy and efficiency of multimodal fact verification, so as to better deal with the spread of false information on social media.

Logically at Factify 2022: Multimodal Fact Verification

INO at Factify 2: Structure Coherence based Multi-Modal Fact Verification

Findings of Factify 2: Multimodal Fake News Detection

FACTIFY3M: A Benchmark for Multimodal Fact Verification with Explainability through 5W Question-Answering

How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models

Factify 2: A Multimodal Fake News and Satire News Dataset

Multimodal Large Language Models to Support Real-World Fact-Checking

Team Trifecta at Factify5WQA: Setting the Standard in Fact Verification with Fine-Tuning

Multimodal Fact-Checking with Vision Language Models: A Probing Classifier based Solution with Embedding Strategies

LRQ-Fact: LLM-Generated Relevant Questions for Multimodal Fact-Checking

Robust Claim Verification Through Fact Detection

Overview of Factify5WQA: Fact Verification through 5W Question-Answering

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

Automated Claim Matching with Large Language Models: Empowering Fact-Checkers in the Fight Against Misinformation

Multi-source Knowledge Enhanced Graph Attention Networks for Multimodal Fact Verification

Factuality challenges in the era of large language models and opportunities for fact-checking

FactKG: Fact Verification via Reasoning on Knowledge Graphs

End-to-End Multimodal Fact-Checking and Explanation Generation: A Challenging Dataset and Models

Unsupervised Pretraining for Fact Verification by Language Model Distillation

LogicalFactChecker: Leveraging Logical Operations for Fact Checking with Graph Module Network

Learning to generate and evaluate fact-checking explanations with transformers