Abstract:Neural network-based approaches have become the driven forces for Natural Language Processing (NLP) tasks. Conventionally, there are two mainstream neural architectures for NLP tasks: the recurrent neural network (RNN) and the convolution neural network (ConvNet). RNNs are good at modeling long-term dependencies over input texts, but preclude parallel computation. ConvNets do not have memory capability and it has to model sequential data as un-ordered features. Therefore, ConvNets fail to learn sequential dependencies over the input texts, but it is able to carry out high-efficient parallel computation. As each neural architecture, such as RNN and ConvNets, has its own pro and con, integration of different architectures is assumed to be able to enrich the semantic representation of texts, thus enhance the performance of NLP tasks. However, few investigation explores the reconciliation of these seemingly incompatible architectures. To address this issue, we propose a hybrid architecture based on a novel hierarchical multi-granularity attention mechanism, named Multi-granularity Attention-based Hybrid Neural Network (MahNN). The attention mechanism is to assign different weights to different parts of the input sequence to increase the computation efficiency and performance of neural models. In MahNN, two types of attentions are introduced: the syntactical attention and the semantical attention. The syntactical attention computes the importance of the syntactic elements (such as words or sentence) at the lower symbolic level and the semantical attention is used to compute the importance of the embedded space dimension corresponding to the upper latent semantics. We adopt the text classification as an exemplifying way to illustrate the ability of MahNN to understand texts. The experimental results on a variety of datasets demonstrate that MahNN outperforms most of the state-of-the-arts for text classification.

HiM: hierarchical multimodal network for document layout analysis

HSCA-Net: A Hybrid Spatial-Channel Attention Network in Multi-Scale Feature Pyramid for Document Layout Analysis

M2Doc: A Multi-Modal Fusion Approach for Document Layout Analysis

Hierarchical Inter-Attention Network for Document Classification with Multi-Task Learning.

A Hybrid Approach for Document Layout Analysis in Document images

LAMPRET: Layout-Aware Multimodal PreTraining for Document Understanding

TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

Hierarchical Multi-Granularity Attention- Based Hybrid Neural Network for Text Classification.

Hierarchical Multi-label Text Classification: an Attention-based Recurrent Network Approach

Hierarchical Multi-modal Prompting Transformer for Multi-modal Long Document Classification

HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval

Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding

Recent trends in incidence and treatment of prostate cancer among elderly men.

Hierarchical Multi-granularity Interaction Graph Convolutional Network for Long Document Classification

Hierarchical Multi-modal Transformer for Cross-modal Long Document Classification

Hierarchical Metadata-Aware Document Categorization under Weak Supervision

Multi-task hierarchical convolutional network for visual-semantic cross-modal retrieval

HmcNet: A General Approach for Hierarchical Multi-Label Classification

LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding

HMNet: a Hierarchical Multi-Modal Network for Educational Video Concept Prediction

DocLLM: A layout-aware generative language model for multimodal document understanding