Abstract:The goal of sentence matching is to determine the semantic relation between two sentences, which is the basis of many downstream tasks in natural language processing, such as question answering and information retrieval. Recent studies using attention mechanism to align the elements of two sentences have shown promising results in capturing semantic similarity/relevance. Most existing methods mainly focus on the design of multi-layer attention network, however, some critical issues have not been dealt with well: 1) the higher attention layer is easily affected by error propagation because it relies on the alignment results of preceding attentions; 2) models have the risk of losing low-layer semantic features with the increase of network depth; and 3) the approach of capturing global matching information brings about large computing complexity for model training. To this end, we propose a Deep Bi-Directional Interaction Network (DBDIN) to solve these issues, which captures semantic relatedness from two directions and each direction employs multiple attention-based interaction units. To be specific, the attention of each interaction unit will repeatedly focus on the original sentence representation of another one for semantic alignment, which alleviates the error propagation problem by attending to a fixed semantic representation. Then we design deep fusion to aggregate and propagate attention information from low layers to high layers, which effectively retains low-layer semantic features for subsequential interactions. Moreover, we introduce a self-attention mechanism at last to enhance global matching information with smaller model complexity. We conduct experiments on natural language inference and paraphrase identification tasks with three benchmark datasets SNLI, SciTail and Quora. Experimental results demonstrate that our proposed method can achieve significant improvements over baseline systems without using any external knowledge. Additionally, we conduct interpretable study to disclose how our deep interaction network with attention can benefit sentence matching, which provides a reference for future model design. Ablation studies and visualization analyses further verify that our model can better capture interactive information between two sentences, and the proposed components are indeed able to help modeling semantic relation more precisely.

Attention-Based Multi-level Network for Text Matching with Feature Fusion

Multi-level network based on transformer encoder for fine-grained image–text matching

Advanced Multimodal Deep Learning Architecture for Image-Text Matching

Interactive Attention Networks for Semantic Text Matching

Multiresolution Graph Attention Networks for Relevance Matching

Fusion Layer Attention for Image-Text Matching.

Lightweight Text Matching Method with Rich Features.

Deep Hierarchical Attention Networks for Text Matching in Information Retrieval

Original Semantics-Oriented Attention and Deep Fusion Network for Sentence Matching

Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking

Multi-scale Matching Networks for Semantic Correspondence

Reference-Aware Adaptive Network for Image-Text Matching

A Multi-Level Feature Fusion Network for Scene Text Detection with Text Attention Mechanism

Deep and Shallow Features Learning for Short Texts Matching

Attention-Fused Deep Matching Network for Natural Language Inference

Multi-scale Motivated Neural Network for Image-Text Matching

Mutil-level Local Alignment and Semantic Matching Network for Image-Text Retrieval

Deep Bi-Directional Interaction Network for Sentence Matching

Multi-Modal Memory Enhancement Attention Network for Image-Text Matching

DIFM: an Effective Deep Interaction and Fusion Model for Sentence Matching

aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model