Interactive Event Sifting using Bayesian Graph Neural Networks

José Nascimento,Nathan Jacobs,Anderson Rocha
2024-10-08
Abstract:Forensic analysts often use social media imagery and texts to understand important events. A primary challenge is the initial sifting of irrelevant posts. This work introduces an interactive process for training an event-centric, learning-based multimodal classification model that automates sanitization. We propose a method based on Bayesian Graph Neural Networks (BGNNs) and evaluate active learning and pseudo-labeling formulations to reduce the number of posts the analyst must manually annotate. Our results indicate that BGNNs are useful for social-media data sifting for forensics investigations of events of interest, the value of active learning and pseudo-labeling varies based on the setting, and incorporating unlabelled data from other events improves performance.
Machine Learning,Social and Information Networks
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in forensic analysis, how to automatically screen out useful information related to specific events from a large number of irrelevant posts on social media. Specifically, the paper aims to develop a system that can classify social media posts efficiently and accurately, so as to reduce the workload of forensic experts in manual screening. The key challenges of this problem are: 1. **Initial screening of irrelevant posts**: The amount of information about an event on social media platforms is huge, and most of it is irrelevant. How to effectively identify and screen out important information related to the event is a major challenge. 2. **High - precision classification with limited labeled data**: Traditional deep - learning methods usually require a large amount of labeled data to perform well, but in actual scenarios, obtaining a large amount of labeled data is both time - consuming and labor - intensive. Therefore, how to achieve high - precision classification with limited labeled data is an important issue. 3. **Classification uncertainty measurement**: In order to support active learning, the model needs to be able to accurately measure the uncertainty of classification, so as to select the most valuable samples for manual labeling. To solve these problems, the paper proposes an interactive event screening method based on Bayesian Graph Neural Networks (BGNNs). This method combines active learning and pseudo - label techniques and enhances model performance by introducing unlabeled data from other events. In addition, the paper also explores how to use KMeans clustering and BALD (Bayesian Active Learning by Disagreement) methods to select the most representative samples for labeling, thereby improving the generalization ability of the model and the final classification effect. ### Main contributions 1. **Improved MC Dropout**: Applying MC Dropout as a Bayesian component to graph neural networks, demonstrating its improved effect in an active learning setting. 2. **Extended BALD method**: Considering diversity when selecting labeled instances, using KMeans for data selection, and extending the BALD method to improve the diversity and accuracy of selection. 3. **Utilizing unlabeled data**: Through the pseudo - label technique, combining unlabeled data from other events to enrich the data set and improve model performance. Through these methods, the paper aims to provide an efficient, accurate and interactive classification system to help forensic experts screen out social media information related to events more quickly and accurately.