Abstract:With the development of deep learning, several graph neural network (GNN)-based approaches have been utilized for text classification. However, GNNs encounter challenges when capturing contextual text information within a document sequence. To address this, a novel text classification model, RB-GAT, is proposed by combining RoBERTa-BiGRU embedding and a multi-head Graph ATtention Network (GAT). First, the pre-trained RoBERTa model is exploited to learn word and text embeddings in different contexts. Second, the Bidirectional Gated Recurrent Unit (BiGRU) is employed to capture long-term dependencies and bidirectional sentence information from the text context. Next, the multi-head graph attention network is applied to analyze this information, which serves as a node feature for the document. Finally, the classification results are generated through a Softmax layer. Experimental results on five benchmark datasets demonstrate that our method can achieve an accuracy of 71.48%, 98.45%, 80.32%, 90.84%, and 95.67% on Ohsumed, R8, MR, 20NG and R52, respectively, which is superior to the existing nine text classification approaches.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges faced by existing text classification models when dealing with graph - structured data. In particular, when text graphs contain multi - level relationships among documents, paragraphs and words, traditional sequential deep - learning models may not be able to effectively handle these complex relationships. Moreover, existing text classification methods based on graph neural networks (GNN) usually use simple node feature initialization methods (such as one - hot encoding), which may lead to high - dimensional and sparse feature matrices, unable to effectively express text similarity, and ignore the fine - grained interactions of words in the document sequence. Specifically, the paper points out the following problems: 1. **Challenges in Capturing Context Information**: Existing GNN methods have difficulties in capturing context information in document sequences, especially performing poorly when dealing with long - dependencies and bidirectional sentence information. 2. **Limitations of Feature Representations**: Traditional methods use one - hot encoding to initialize node features, resulting in high - dimensional sparse matrices and being unable to effectively express text similarity. 3. **Ignoring Sentimental Text Data**: Existing methods are difficult to handle sentiment - rich text data because they ignore the complex relationships of words in the document sequence. To solve these problems, the paper proposes a new text classification model RB - GAT (RoBERTa - BiGRU with Graph ATtention Network), which combines RoBERTa - BiGRU embeddings and multi - head graph attention networks (multi - head GAT). Through this method, RB - GAT can better capture long - dependencies and bidirectional information in texts, and effectively handle graph - structured data through the graph attention mechanism, thereby improving the accuracy of text classification. ### Formula Summary 1. **TF - IDF Calculation Formula**: \[ \text{TF - IDF}(w, d)=\text{TF}(w, d)\times\log\left(\frac{N}{\text{DF}(w)}\right) \] where $\text{TF}(w, d)$ is the frequency of word $w$ in document $d$, $N$ is the total number of documents, and $\text{DF}(w)$ is the number of documents containing word $w$. 2. **Pointwise Mutual Information (PMI) Calculation Formula**: \[ \text{PMI}(w_i, w_j)=\log\left(\frac{p(w_i, w_j)}{p(w_i)p(w_j)}\right) \] where $p(w_i, w_j)$ is the probability of co - occurrence of words $w_i$ and $w_j$ in the context window, and $p(w_i)$ and $p(w_j)$ are the probabilities of words $w_i$ and $w_j$ appearing in the corpus respectively. 3. **Edge Weight Definition**: \[ A_{ij}=\begin{cases} \text{PMI}(i, j)&\text{if }i, j\text{ are words and }\text{PMI}(i, j)>0\\ \text{TF - IDF}(i, j)&\text{if }i\text{ is a word, }j\text{ is a document}\\ 1&\text{if }i = j\\ 0&\text{otherwise} \end{cases} \] 4. **BiGRU Output Formula**: \[ \vec{h}_i = [\vec{h}_i^F || \vec{h}_i^B] \] where $\vec{h}_i^F$ and $\vec{h}_i^B$ are the hidden states of the forward and backward GRU respectively. 5. **Graph Attention Coefficient Calculation Formula**: \[ e_{ij}=\text{LeakyReLU}\left(\vec{a}^T [W\vec{h}_i || W\vec{h}_j]\right)

RB-GAT: A Text Classification Model Based on RoBERTa-BiGRU with Graph ATtention Network

NGAT: Attention in Breadth and Depth Exploration for Semi-Supervised Graph Representation Learning

RLGAT: Retweet Prediction in Social Networks Using Representation Learning and GATs

Multilabel Text Classification Using Multilayer DGAT

Chinese text classification by combining Chinese-BERTology-wwm and GCN

geoGAT: Graph Model Based on Attention Mechanism for Geographic Text Classification

A BERT-Based Hybrid Short Text Classification Model Incorporating CNN and Attention-Based BiGRU

Text Classification Based on Knowledge Graphs and Improved Attention Mechanism

CRAN: A Hybrid CNN-RNN Attention-Based Model for Text Classification

Research on Text Classification Based on BERT-BiGRU Model

BertGCN: Transductive Text Classification by Combining GNN and BERT.

Dual-channel and multi-granularity gated graph attention network for aspect-based sentiment analysis

Enhanced Text Classification with Label-Aware Graph Convolutional Networks

Recursive Graphical Neural Networks for Text Classification

VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification

RRL-GAT: Graph Attention Network-driven Multi-Label Image Robust Representation Learning

A Robust graph attention network with dynamic adjusted Graph

GAProtoNet: A Multi-head Graph Attention-based Prototypical Network for Interpretable Text Classification

PD-GATv2: positive difference second generation graph attention network based on multi-granularity in information systems to classification

Quadratic Graph Attention Network (Q-GAT) for Robust Construction of Gene Regulatory Networks

CBGT-Net: A Neuromimetic Architecture for Robust Classification of Streaming Data