Abstract:Natural language inference (NLI) task requires an agent to determine the semantic relation between a premise sentence ( ${p}$ ) and a hypothesis sentence ( ${h}$ ), which demands sufficient understanding about sentences semantic. Due to the issues, such as polysemy, ambiguity, as well as fuzziness of sentences, intense sentence understanding is very challenging. To this end, in this article, we introduce the corresponding image of sentences as reference information for enhancing sentence semantic understanding and representing. Specifically, we first propose an image-enhanced multilevel sentence representation net (IEMLRN), that utilizes the image features from pretrained models for enhancing the sentence semantic understanding at different scales, i.e., lexical-level, phrase-level, and sentence-level. The proposed model advances the performance on NLI tasks by leveraging the pretrained global features of images. However, as these pretrained image features are optimized on specific image classification datasets, they may not show the best performance on NLI tasks. Therefore, we further propose to design an adaptive image feature generator that extracts fine-grained image labels from the corresponding sentences. After that, we extend the IEMLRN to multilevel image-enhanced sentence representation net (MIESR) by utilizing not only the coarse-grained pretrained features of images, but also the fine-grained adaptive features of images. Therefore, sentence semantic can be evaluated and enhanced more comprehensively and precisely. Extensive experiments on two benchmark datasets (SNLI, SICK) clearly show our proposed IEMLRN significantly outperform the state-of-the-art baselines, and our proposed MIESR model achieves the best performance by considering not only the text but also images in an adaptive multigranularities way.

Feature Fusion Transformer Network for Natural Language Inference

Multi-Feature Fusion Transformer for Natural Language Inference

Natural Language Inference Using Lstm Model With Sentence Fusion

Attention-Fused Deep Matching Network for Natural Language Inference

Convolutional Interaction Network for Natural Language Inference

SDF-NN: A Deep Neural Network with Semantic Dropping and Fusion for Natural Language Inference

Gaussian Transformer: A Lightweight Approach for Natural Language Inference

Multi-turn Inference Matching Network for Natural Language Inference

Natural Language Inference Based On The Lic Architecture With Dcae Feature

Research on Attention Memory Networks As a Model for Learning Natural Language Inference.

Deep Converged Network for Attention

Collaborative Attention Network for Natural Language Inference

Context-Aware Tree-Based Convolutional Neural Networks for Natural Language Inference.

Dependent Multilevel Interaction Network For Natural Language Inference

Enhancing and Combining Sequential and Tree LSTM for Natural Language Inference.

Enhanced Lstm For Natural Language Inference

Natural Language-centered Inference Network for Multi-modal Fake News Detection

Multilevel Image-Enhanced Sentence Representation Net for Natural Language Inference

Asynchronous Deep Interaction Network for Natural Language Inference.

FTC-Net: Fusion of Transformer and CNN Features for Infrared Small Target Detection

CTRAN: CNN-Transformer-based Network for Natural Language Understanding