Abstract:Semantic comprehension aims to reasonably reproduce people's real intentions or thoughts, e.g., sentiment, humor, sarcasm, motivation, and offensiveness, from multiple modalities. It can be instantiated as a multimodal-oriented multitask classification issue and applied to scenarios, such as online public opinion supervision and political stance analysis. Previous methods generally employ multimodal learning alone to deal with varied modalities or solely exploit multitask learning to solve various tasks, a few to unify both into an integrated framework. Moreover, multimodal-multitask cooperative learning could inevitably encounter the challenges of modeling high-order relationships, i.e., intramodal, intermodal, and intertask relationships. Related research of brain sciences proves that the human brain possesses multimodal perception and multitask cognition for semantic comprehension via decomposing, associating, and synthesizing processes. Thus, establishing a brain-inspired semantic comprehension framework to bridge the gap between multimodal and multitask learning becomes the primary motivation of this work. Motivated by the superiority of the hypergraph in modeling high-order relations, in this article, we propose a hypergraph-induced multimodal-multitask (HIMM) network for semantic comprehension. HIMM incorporates monomodal, multimodal, and multitask hypergraph networks to, respectively, mimic the decomposing, associating, and synthesizing processes to tackle the intramodal, intermodal, and intertask relationships accordingly. Furthermore, temporal and spatial hypergraph constructions are designed to model the relationships in the modality with sequential and spatial structures, respectively. Also, we elaborate a hypergraph alternative updating algorithm to ensure that vertices aggregate to update hyperedges and hyperedges converge to update their connected vertices. Experiments on the dataset with two modalities and five tasks verify the effectiveness of HIMM on semantic comprehension.

Investigating Inner Properties of Multimodal Representation and Semantic Compositionality with Brain-based Componential Semantics

Visualizing and Understanding Neural Models in NLP

Bridging the Semantic Latent Space Between Brain and Machine: Similarity is All You Need

Investigating Multisensory Integration in Emotion Recognition Through Bio-Inspired Computational Models

A Multimodal Visual Encoding Model Aided by Introducing Verbal Semantic Information

Interactions Between Modal and Amodal Semantic Areas in Spoken Word Comprehension.

Model Composition for Multimodal Large Language Models

MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language Model

CMCI: A Robust Multimodal Fusion Method for Spiking Neural Networks

Semantic Composition in Visually Grounded Language Models

Modeling Semantic Compositionality With Sememe Knowledge

Multimodal foundation models are better simulators of the human brain

Brain-inspired Multimodal Learning Based on Neural Networks

Exploring the Relationship Between Visual Information and Language Semantic Concept in the Human Brain

Modelling Multimodal Integration in Human Concept Processing with Vision-and-Language Models

Decoding Semantics Categorization During Natural Viewing of Video Streams.

A Tri-network Model of Human Semantic Processing.

Modeling High-Order Relationships: Brain-Inspired Hypergraph-Induced Multimodal-Multitask Framework for Semantic Comprehension

Multimodal Deep Embedding via Hierarchical Grounded Compositional Semantics.

Distributed Representation of Semantics in the Human Brain: Evidence from Studies Using Natural Language Processing Techniques

Neural language model and semantic compositionality model in semantic simi-larity