Abstract:Nowadays, community question answering (CQA) systems have attracted millions of users to share their valuable knowledge. Matching relevant answers for a specific question is a core function of CQA systems. Previous interaction-based matching approaches show promising performance in CQA systems. However, they typically suffer from two limitations: (1) They usually model content as word sequences, which ignores the semantics provided by non-consecutive phrases, long-distance word dependency and visual information. (2) Word-level interactions focus on the distribution of similar words in terms of position, while being agnostic to the semantic-level interactions between questions and answers. To address these limitations, we propose aHierarchical Graph Semantic Pooling Network (HGSPN) to model the hierarchical semantic-level interactions in a unified framework for multi-modal CQA matching. Instead of viewing text content as word sequences, we convert them into graphs, which can model non-consecutive phrases and long-distance word dependency for better obtaining the composition of semantics. In addition, visual content is also modeled into the graphs to provide complementary semantics. A well-designed stacked graph pooling network is proposed to capture the hierarchical semantic-level interactions between questions and answers based on these graphs. A novel convolutional matching network is designed to infer the matching score by integrating the hierarchical semantic-level interaction features. Experimental results on two real-world datasets demonstrate that our model outperforms the state-of-the-art CQA matching models.

Discovering Multimodal Hierarchical Structures with Graph Neural Networks for Multi-modal and Multi-hop Question Answering.

Hierarchical Graph Network for Multi-hop Question Answering

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multimodal Graph Transformer for Multimodal Question Answering

From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering

An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism

VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering

Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering

Multi-Turn Video Question Answering Via Multi-Stream Hierarchical Attention Context Network

Multi-Turn Video Question Answering via Hierarchical Attention Context Reinforced Networks

Hierarchical Attention Networks for Multimodal Machine Learning

Question guided multimodal receptive field reasoning network for fact-based visual question answering

Question-Aware Memory Network for Multi-hop Question Answering in Human-Robot Interaction

Hierarchical Graph Semantic Pooling Network for Multi-modal Community Question Answer Matching

Multi-modal Contextual Graph Neural Network for Text Visual Question Answering.

Joint Learning of Object Graph and Relation Graph for Visual Question Answering

A Universal Quaternion Hypergraph Network for Multimodal Video Question Answering

Graphhopper: Multi-hop Scene Graph Reasoning for Visual Question Answering

Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning

Ask to Understand: Question Generation for Multi-hop Question Answering

Multi-hop Question Answering via Reasoning Chains