Abstract:Recurrent Neural Networks (RNNs) are widely used in the field of natural language processing (NLP), ranging from text categorization to question answering and machine translation. However, RNNs generally read the whole text from beginning to end or vice versa sometimes, which makes it inefficient to process long texts. When reading a long document for a categorization task, such as topic categorization, large quantities of words are irrelevant and can be skipped. To this end, we propose Leap-LSTM, an LSTM-enhanced model which dynamically leaps between words while reading texts. At each step, we utilize several feature encoders to extract messages from preceding texts, following texts and the current word, and then determine whether to skip the current word. We evaluate Leap-LSTM on several text categorization tasks: sentiment analysis, news categorization, ontology classification and topic classification, with five benchmark data sets. The experimental results show that our model reads faster and predicts better than standard LSTM. Compared to previous models which can also skip words, our model achieves better trade-offs between performance and efficiency.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in text classification tasks, long - text processing is inefficient and contains a large number of irrelevant words. Traditional recurrent neural networks (RNNs) and their variants such as long - short - term memory networks (LSTMs) usually need to read word by word from beginning to end when processing long texts. This is not only inefficient, but also in text classification tasks, many words are irrelevant to the task objective and can be skipped. For this reason, the author proposes the Leap - LSTM model, aiming to improve the efficiency of text processing and prediction performance by dynamically skipping unimportant words.
### Main contributions
1. **Proposing the Leap - LSTM model**: This model is improved on the basis of the standard LSTM and can dynamically skip unimportant words when reading texts. "Leap" not only means that the model can skip words, but also represents an improvement to the LSTM.
2. **Experimental results**: The experimental results show that Leap - LSTM is not only faster in inference but also has better prediction performance in multiple text classification tasks. Compared with other word - skipping models, Leap - LSTM can skip more unimportant words and achieve a better balance between performance and efficiency.
3. **Exploring the reasons for performance improvement**: Through extensive experiments, the author provides a new explanation for why Leap - LSTM performs better in some cases. In particular, the author introduces a new training scheme - schedule - training, which further verifies this hypothesis.
### Method overview
- **Model architecture**: Leap - LSTM is based on the standard LSTM, but calculates the probability of skipping the current word at each time step. Specifically, the model combines the information of the preceding text, the following text and the current word, and calculates the probability of skipping or retaining through a multi - layer perceptron (MLP).
- **Feature encoder**: In order to efficiently extract the features of the preceding and following texts, the model uses LSTM and convolutional neural network (CNN). The preceding - text features directly use the previous hidden state \( h_{t - 1} \), and the following - text features are divided into local and global parts, which are encoded by CNN and reverse LSTM respectively.
- **Relaxation of discrete variables**: Since it is necessary to sample decisions (skip or retain) from the categorical distribution, the model uses the Gumbel - Softmax distribution for approximation, making the entire model differentiable.
### Experimental results
- **Model performance**: The experimental results on five benchmark datasets show that when Leap - LSTM skips about 60% or 90% of the words, the accuracy does not decrease significantly, and even exceeds the standard LSTM on some datasets. In particular, on the AGNews, DBPedia and Yahoo datasets, Leap - LSTM has a higher accuracy when the speed is accelerated by 1.5 to 1.7 times.
- **Comparison with related models**: Compared with other word - skipping models (such as LSTM - Jump, Skip RNN and Skim - RNN), Leap - LSTM performs well in both performance and efficiency. In particular, on the AGNews dataset, when Leap - LSTM skips 57.08% of the words, the accuracy reaches 93.64% and the speed is increased by 1.5 times.
- **schedule - training**: By introducing the schedule - training scheme, the author further verifies that dynamically changing training samples helps to improve the model performance. The experimental results show that the LSTM using schedule - training is superior to the standard LSTM in all tasks and is close to the performance of Leap - LSTM.
### Conclusion
Leap - LSTM significantly improves the processing efficiency and prediction performance of text classification tasks by dynamically skipping unimportant words. The innovation of the model lies in comprehensively considering the information of the preceding text, the following text and the current word, so as to more accurately decide whether to skip a certain word. In addition, through the schedule - training scheme, the author further reveals the positive impact of dynamically changing training samples on the model performance.