RB-GAT: A Text Classification Model Based on RoBERTa-BiGRU with Graph ATtention Network

Shaoqing Lv,Jungang Dong,Chichi Wang,Xuanhong Wang,Zhiqiang Bao
DOI: https://doi.org/10.3390/s24113365
IF: 3.9
2024-05-25
Sensors
Abstract:With the development of deep learning, several graph neural network (GNN)-based approaches have been utilized for text classification. However, GNNs encounter challenges when capturing contextual text information within a document sequence. To address this, a novel text classification model, RB-GAT, is proposed by combining RoBERTa-BiGRU embedding and a multi-head Graph ATtention Network (GAT). First, the pre-trained RoBERTa model is exploited to learn word and text embeddings in different contexts. Second, the Bidirectional Gated Recurrent Unit (BiGRU) is employed to capture long-term dependencies and bidirectional sentence information from the text context. Next, the multi-head graph attention network is applied to analyze this information, which serves as a node feature for the document. Finally, the classification results are generated through a Softmax layer. Experimental results on five benchmark datasets demonstrate that our method can achieve an accuracy of 71.48%, 98.45%, 80.32%, 90.84%, and 95.67% on Ohsumed, R8, MR, 20NG and R52, respectively, which is superior to the existing nine text classification approaches.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges faced by existing text classification models when dealing with graph - structured data. In particular, when text graphs contain multi - level relationships among documents, paragraphs and words, traditional sequential deep - learning models may not be able to effectively handle these complex relationships. Moreover, existing text classification methods based on graph neural networks (GNN) usually use simple node feature initialization methods (such as one - hot encoding), which may lead to high - dimensional and sparse feature matrices, unable to effectively express text similarity, and ignore the fine - grained interactions of words in the document sequence. Specifically, the paper points out the following problems: 1. **Challenges in Capturing Context Information**: Existing GNN methods have difficulties in capturing context information in document sequences, especially performing poorly when dealing with long - dependencies and bidirectional sentence information. 2. **Limitations of Feature Representations**: Traditional methods use one - hot encoding to initialize node features, resulting in high - dimensional sparse matrices and being unable to effectively express text similarity. 3. **Ignoring Sentimental Text Data**: Existing methods are difficult to handle sentiment - rich text data because they ignore the complex relationships of words in the document sequence. To solve these problems, the paper proposes a new text classification model RB - GAT (RoBERTa - BiGRU with Graph ATtention Network), which combines RoBERTa - BiGRU embeddings and multi - head graph attention networks (multi - head GAT). Through this method, RB - GAT can better capture long - dependencies and bidirectional information in texts, and effectively handle graph - structured data through the graph attention mechanism, thereby improving the accuracy of text classification. ### Formula Summary 1. **TF - IDF Calculation Formula**: \[ \text{TF - IDF}(w, d)=\text{TF}(w, d)\times\log\left(\frac{N}{\text{DF}(w)}\right) \] where $\text{TF}(w, d)$ is the frequency of word $w$ in document $d$, $N$ is the total number of documents, and $\text{DF}(w)$ is the number of documents containing word $w$. 2. **Pointwise Mutual Information (PMI) Calculation Formula**: \[ \text{PMI}(w_i, w_j)=\log\left(\frac{p(w_i, w_j)}{p(w_i)p(w_j)}\right) \] where $p(w_i, w_j)$ is the probability of co - occurrence of words $w_i$ and $w_j$ in the context window, and $p(w_i)$ and $p(w_j)$ are the probabilities of words $w_i$ and $w_j$ appearing in the corpus respectively. 3. **Edge Weight Definition**: \[ A_{ij}=\begin{cases} \text{PMI}(i, j)&\text{if }i, j\text{ are words and }\text{PMI}(i, j)>0\\ \text{TF - IDF}(i, j)&\text{if }i\text{ is a word, }j\text{ is a document}\\ 1&\text{if }i = j\\ 0&\text{otherwise} \end{cases} \] 4. **BiGRU Output Formula**: \[ \vec{h}_i = [\vec{h}_i^F || \vec{h}_i^B] \] where $\vec{h}_i^F$ and $\vec{h}_i^B$ are the hidden states of the forward and backward GRU respectively. 5. **Graph Attention Coefficient Calculation Formula**: \[ e_{ij}=\text{LeakyReLU}\left(\vec{a}^T [W\vec{h}_i || W\vec{h}_j]\right)