Abstract:Neural network-based approaches have become the driven forces for Natural Language Processing (NLP) tasks. Conventionally, there are two mainstream neural architectures for NLP tasks: the recurrent neural network (RNN) and the convolution neural network (ConvNet). RNNs are good at modeling long-term dependencies over input texts, but preclude parallel computation. ConvNets do not have memory capability and it has to model sequential data as un-ordered features. Therefore, ConvNets fail to learn sequential dependencies over the input texts, but it is able to carry out high-efficient parallel computation. As each neural architecture, such as RNN and ConvNets, has its own pro and con, integration of different architectures is assumed to be able to enrich the semantic representation of texts, thus enhance the performance of NLP tasks. However, few investigation explores the reconciliation of these seemingly incompatible architectures. To address this issue, we propose a hybrid architecture based on a novel hierarchical multi-granularity attention mechanism, named Multi-granularity Attention-based Hybrid Neural Network (MahNN). The attention mechanism is to assign different weights to different parts of the input sequence to increase the computation efficiency and performance of neural models. In MahNN, two types of attentions are introduced: the syntactical attention and the semantical attention. The syntactical attention computes the importance of the syntactic elements (such as words or sentence) at the lower symbolic level and the semantical attention is used to compute the importance of the embedded space dimension corresponding to the upper latent semantics. We adopt the text classification as an exemplifying way to illustrate the ability of MahNN to understand texts. The experimental results on a variety of datasets demonstrate that MahNN outperforms most of the state-of-the-arts for text classification.

Multi-Scale Self-Attention for Text Classification

Multiscale 3-D-2-D Mixed CNN and Lightweight Attention-Free Transformer for Hyperspectral and LiDAR Classification

A Novel Transformer Network with a CNN-Enhanced Cross-Attention Mechanism for Hyperspectral Image Classification

Densely Connected CNN with Multi-scale Feature Attention for Text Classification

Hierarchical Multi-Granularity Attention- Based Hybrid Neural Network for Text Classification.

Graph Attention Transformer Network for Multi-label Image Classification

Hierarchical Multi-label Text Classification: Self-adaption Semantic Awareness Network Integrating Text Topic and Label Level Information

Dual-axial Self-Attention Network for Text Classification

A Multiscale Visualization of Attention in the Transformer Model

Channel2DTransformer: A Multi-level Features Self-attention Fusion Module for Semantic Segmentation

Dynamic multi-headed self-attention and multiscale enhancement vision transformer for object detection

MHST: Multiscale Head Selection Transformer for Hyperspectral and LiDAR Classification

Multiscale attention for few‐shot image classification

Enhanced Pre-Trained Transformer with Aligned Attention Map for Text Matching

Multi-Scale Residual Spectral–Spatial Attention Combined with Improved Transformer for Hyperspectral Image Classification

Tree Transformer: Integrating Tree Structures into Self-Attention.

Query2Label: A Simple Transformer Way to Multi-Label Classification

Chinese Short Text Classification with Mutual-Attention Convolutional Neural Networks

Vision Transformers with Hierarchical Attention

Low-Rank and Locality Constrained Self-Attention for Sequence Modeling.