Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles

Abhijnan Nath,Huma Jamil,Shafiuddin Rehan Ahmed,George Baker,Rahul Ghosh,James H. Martin,Nathaniel Blanchard,Nikhil Krishnaswamy

2024-04-13

Abstract:Event coreference resolution (ECR) is the task of determining whether distinct mentions of events within a multi-document corpus are actually linked to the same underlying occurrence. Images of the events can help facilitate resolution when language is ambiguous. Here, we propose a multimodal cross-document event coreference resolution method that integrates visual and textual cues with a simple linear map between vision and language models. As existing ECR benchmark datasets rarely provide images for all event mentions, we augment the popular ECB+ dataset with event-centric images scraped from the internet and generated using image diffusion models. We establish three methods that incorporate images and text for coreference: 1) a standard fused model with finetuning, 2) a novel linear mapping method without finetuning and 3) an ensembling approach based on splitting mention pairs by semantic and discourse-level difficulty. We evaluate on 2 datasets: the augmented ECB+, and AIDA Phase 1. Our ensemble systems using cross-modal linear mapping establish an upper limit (91.9 CoNLL F1) on ECB+ ECR performance given the preprocessing assumptions used, and establish a novel baseline on AIDA Phase 1. Our results demonstrate the utility of multimodal information in ECR for certain challenging coreference problems, and highlight a need for more multimodal resources in the coreference resolution space.

Computation and Language

What problem does this paper attempt to address?

The paper primarily addresses the problem of Cross-Document Event Coreference Resolution (ECR), particularly the challenges faced when dealing with articles from different sources and different expressions that describe the same event. Specifically, the paper proposes a method that utilizes multimodal information (text and images) to improve the accuracy of event coreference resolution. The key contributions of the paper include: 1. **Proposing a new multimodal cross-document event coreference resolution method**: This method combines visual and textual cues and establishes a connection between visual and language models through a simple linear mapping. 2. **Enhancing existing datasets**: Since existing ECR benchmark datasets rarely provide images for all event mentions, the authors enhanced the ECB+ dataset by web scraping and using image diffusion models. 3. **Three different multimodal coreference methods**: These include a standard fusion model, a novel linear mapping method, and a model integration method based on semantic and discourse difficulty classification. 4. **Evaluation results**: Evaluations were conducted on the ECB+ and AIDA Phase 1 datasets, achieving a CoNLL F1 score of 91.9 on ECB+ and establishing a new baseline on the AIDA Phase 1 dataset. Through these methods, the paper demonstrates the effectiveness of multimodal information in addressing some challenging event coreference problems and emphasizes the need for more multimodal resources to advance the field of coreference resolution.

Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles

Linear Cross-document Event Coreference Resolution with X-AMR

Okay, Let's Do This! Modeling Event Coreference with Generated Rationales and Knowledge Distillation

Generating Harder Cross-document Event Coreference Resolution Datasets using Metaphoric Paraphrasing

Synergetic Event Understanding: A Collaborative Approach to Cross-Document Event Coreference Resolution with Large Language Models

A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues

A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution

Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval

EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning

Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes

Cross-Modal Reasoning with Event Correlation for Video Question Answering

Event Coreference Resolution for Contentious Politics Events

Filling in the Gaps: Efficient Event Coreference Resolution using Graph Autoencoder Networks

Investigating Multilingual Coreference Resolution by Universal Annotations

Visual Coreference Resolution in Visual Dialog using Neural Module Networks

Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge

Cross-Modal Retrieval With Noisy Correspondence via Consistency Refining and Mining

Seeing the Forest and the Trees: Detection and Cross-Document Coreference Resolution of Militarized Interstate Disputes

Multimodal feature fusion for robust event detection in web videos

Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency

Cross-modal Image-Text Retrieval with Multitask Learning