Exif2Vec: A Framework to Ascertain Untrustworthy Crowdsourced Images Using Metadata

Muhammad Umair,Athman Bouguettaya,Abdallah Lakhdari,Mourad Ouzzani,Yuyun Liu
DOI: https://doi.org/10.1145/3645094
IF: 3.35
2024-02-14
ACM Transactions on the Web
Abstract:In the context of social media, the integrity of images is often dubious. To tackle this challenge, we introduce Exif2Vec , a novel framework specifically designed to discover modifications in social media images. The proposed framework leverages an image's metadata to discover changes in an image. We use a service-oriented approach that considers discovery of changes in images as a service . A novel word-embedding based approach is proposed to discover semantic inconsistencies in an image metadata that are reflective of the changes in an image. These inconsistencies are used to measure the severity of changes. The novelty of the approach resides in that it does not require the use of images to determine the underlying changes. We use a pretrained Word2Vec model to conduct experiments. The model is validated on two different fact-checked image datasets, i.e., images related to general context and a context specific image dataset. Notably, our findings showcase the remarkable efficacy of our approach, yielding results of up to 80% accuracy. This underscores the potential of our framework.
computer science, information systems, software engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to use image metadata to detect untrustworthy (or forged) pictures in social media**. Specifically, the paper proposes a framework named **Exif2Vec**, aiming to discover and quantify modifications in social media pictures by analyzing the metadata of images. The following are the core of the problem: 1. **Background problems**: - The credibility of pictures on social media is often in doubt. Some pictures may be tampered with to introduce misleading information, which can lead to serious social or political consequences. - Picture modifications can occur in the picture itself or its description, such as tampering with the date, location, or background information. 2. **Deficiencies of existing methods**: - Traditional image processing methods mainly rely on visual content analysis and may overlook clues in metadata. - Methods based on user comments or behaviors may be biased because false content may also receive supportive comments. 3. **Objectives of the paper**: - Propose a lightweight and objective framework (Exif2Vec) that only relies on image metadata and related text information to evaluate the credibility of pictures. - Detect whether a picture has been tampered with by analyzing the inconsistencies and semantic differences in the metadata. ### Key problems solved by the paper - **How to discover traces of picture tampering from metadata?** - The paper proposes a method based on word embedding (Word2Vec), which transforms non - functional attributes in metadata into vector representations and identifies inconsistencies by calculating similarity distances. - **How to quantify the severity of picture modification?** - The Exif2Vec framework not only detects the existence of modifications but also measures the severity of modifications by analyzing semantic differences in metadata. - **How to deal with the situation of missing metadata?** - The paper assumes that from the perspective of social media platforms, metadata is available. If part of the metadata is missing, relevant information can be supplemented through reverse image search (RIS). ### Core formulas and technical details In the Exif2Vec framework, the key steps include: 1. **Extract non - functional attributes**: - Non - functional attributes include spatial features (such as GPS coordinates, city, country), temporal features (such as shooting time, time - zone offset), and contextual features (such as title, description). - Example of formula: Suppose the metadata of a certain picture contains the following information: $$ \text{Metadata}=\{\text{GPS Coordinates}, \text{City}, \text{State}, \text{Country}, \text{Date}, \text{Time}\} $$ 2. **Generate attribute embeddings**: - Use a pre - trained Word2Vec model to transform metadata attributes into vector representations: $$ \text{Attribute Embedding}=f_{\text{Word2Vec}}(\text{Metadata}) $$ 3. **Calculate similarity distances**: - Identify inconsistencies by calculating the cosine distance or Euclidean distance between attribute vectors: $$ \text{Similarity Distance}=\cos(\vec{v}_i, \vec{v}_j)\quad\text{or}\quad d(\vec{v}_i, \vec{v}_j) $$ 4. **Quantify the severity of modification**: - Evaluate the credibility of the picture according to the number and degree of inconsistencies. ### Summary The main contribution of the paper is to provide a new perspective to detect and quantify picture tampering by analyzing semantic inconsistencies in image metadata. This method has a low computational cost and is suitable for large - scale applications, especially for scenarios with limited resources or tight time constraints. However, the author also admits that this method may have limitations in the case of complex image content tampering.