Abstract:This study investigates a novel commonsense inference task that comprises reasoning about the subsequent events chains and probable consumption intents, given an event described in brief free-form text. For instance, given the event “fall in love”, the system anticipates ensuing events such as “get married”→“be pregnant”→“have a baby”. Concurrently, the system predicts the likely consumption intents of the event participants, such as purchasing “jewelry”, “maternity clothing”, and “baby food”. The event chains generated in this process provide explicit, meaningful explanations, aiding the understanding of inferred consumption intents. To facilitate this study, we construct a new crowdsourced corpus: x-ECON. This corpus comprises 10,144 event chains and 150 event-related product categories, offering a wide range of everyday consumer events and situations. We introduce a baseline reasoning framework that not only infers consumption intent but also generates an event chain to explain its inference. Our experimental results suggest that involving the event chain reasoning in the event-consumption reasoning system can help improve the neural networks' reason about the probable consumption intents of the event participants. Additionally, our method demonstrates the applicability of our approach in improving the performance of recommendation systems, by highlighting the role of explainable commonsense inference on the consumption intent. We also evaluated the performance of ChatGPT on our x-ECON dataset, showing that explainable event-consumption commonsense reasoning is a challenging task for large language models.

ECHo: A Visio-Linguistic Dataset for Event Causality Inference via Human-Centric Reasoning

Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework

Enhancing human-like multimodal reasoning: a new challenging dataset and comprehensive framework

CELLO: Causal Evaluation of Large Vision-Language Models

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues

MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning

CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog

MTR: A Dataset Fusing Inductive, Deductive, and Defeasible Reasoning

ERGO: Event Relational Graph Transformer for Document-level Event Causality Identification

AVoE: A Synthetic 3D Dataset on Understanding Violation of Expectation for Artificial Cognition

x-ECON: Explainable Event-Consumption Commonsense Reasoning

CLEVRER-Humans: Describing Physical and Causal Events the Human Way

E-Care: a New Dataset for Exploring Explainable Causal Reasoning

ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning

Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog

VideoCoT: A Video Chain-of-Thought Dataset with Active Annotation Tool

Can-Do! A Dataset and Neuro-Symbolic Grounded Framework for Embodied Planning with Large Multimodal Models

Cross-modal Observation Hypothesis Inference