Abstract:Natural language inference (NLI) task requires an agent to determine the semantic relation between a premise sentence ( ${p}$ ) and a hypothesis sentence ( ${h}$ ), which demands sufficient understanding about sentences semantic. Due to the issues, such as polysemy, ambiguity, as well as fuzziness of sentences, intense sentence understanding is very challenging. To this end, in this article, we introduce the corresponding image of sentences as reference information for enhancing sentence semantic understanding and representing. Specifically, we first propose an image-enhanced multilevel sentence representation net (IEMLRN), that utilizes the image features from pretrained models for enhancing the sentence semantic understanding at different scales, i.e., lexical-level, phrase-level, and sentence-level. The proposed model advances the performance on NLI tasks by leveraging the pretrained global features of images. However, as these pretrained image features are optimized on specific image classification datasets, they may not show the best performance on NLI tasks. Therefore, we further propose to design an adaptive image feature generator that extracts fine-grained image labels from the corresponding sentences. After that, we extend the IEMLRN to multilevel image-enhanced sentence representation net (MIESR) by utilizing not only the coarse-grained pretrained features of images, but also the fine-grained adaptive features of images. Therefore, sentence semantic can be evaluated and enhanced more comprehensively and precisely. Extensive experiments on two benchmark datasets (SNLI, SICK) clearly show our proposed IEMLRN significantly outperform the state-of-the-art baselines, and our proposed MIESR model achieves the best performance by considering not only the text but also images in an adaptive multigranularities way.

Semantic Representation and Inference for NLP

Structured Label Inference for Visual Understanding.

Visualizing and Understanding Neural Models in NLP

Automatic Document Summarization Via Deep Neural Networks

Enhanced Lstm For Natural Language Inference

Natural Language Inference Using Lstm Model With Sentence Fusion

Semantics-Aware Inferential Network for Natural Language Understanding

Learning the Implicit Semantic Representation on Graph-Structured Data

Enhancing and Combining Sequential and Tree LSTM for Natural Language Inference.

Evaluating Distributed Representations for Multi-Level Lexical Semantics: A Research Proposal

Unsupervised Deep Structured Semantic Models for Commonsense Reasoning.

Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference

Improving Interpretability of Deep Neural Networks with Semantic Information

Progress in Neural NLP: Modeling, Learning, and Reasoning

Leveraging Natural Supervision for Language Representation Learning and Generation

Deep Semantic Role Labeling With Self-Attention

Deep Semantic Understanding of High Resolution Remote Sensing Image

Multilevel Image-Enhanced Sentence Representation Net for Natural Language Inference

Natural Language Inference with External Knowledge.

Explicit Contextual Semantics for Text Comprehension

On Learning Semantic Representations for Large-Scale Abstract Sketches