Reverse Modeling in Large Language Models

Sicheng Yu,Yuanchen Xu,Cunxiao Du,Yanying Zhou,Minghui Qiu,Qianru Sun,Hao Zhang,Jiawei Wu
2024-10-13
Abstract:Humans are accustomed to reading and writing in a forward manner, and this natural bias extends to text understanding in auto-regressive large language models (LLMs). This paper investigates whether LLMs, like humans, struggle with reverse modeling, specifically with reversed text inputs. We found that publicly available pre-trained LLMs cannot understand such inputs. However, LLMs trained from scratch with both forward and reverse texts can understand them equally well during inference. Our case study shows that different-content texts result in different losses if input (to LLMs) in different directions -- some get lower losses for forward while some for reverse. This leads us to a simple and nice solution for data selection based on the loss differences between forward and reverse directions. Using our selected data in continued pretraining can boost LLMs' performance by a large margin across different language understanding benchmarks.
Computation and Language
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper explores the performance of large - language models (LLMs) in handling reverse - modeling tasks. Specifically, the researchers simulate reverse - modeling by reversing text sequences, aiming to answer the following main questions: 1. **Can LLMs perform reverse - modeling?** - The researchers explore this question through two pre - training methods: continued training from a pre - trained model and from - scratch training. The results show that in the case of from - scratch training, LLMs can handle both forward and reverse input texts equally well, while in the case of continued training, LLMs perform poorly on reverse - input. 2. **Does the data domain affect the reverse - modeling ability of LLMs?** - The researchers used data sets covering multiple domains (such as books, academic papers, web - scraped texts, etc.) and found that texts in different domains perform differently in reverse - modeling. In particular, high - quality texts (such as books and academic papers) perform better in reverse - modeling. 3. **What features make texts easier to process in the reverse direction?** - By analyzing the loss differences of texts, the researchers found that texts with clear structures and logical coherence perform better in reverse - modeling. These texts are usually of high quality and can simplify the reverse - prediction task. 4. **Can reverse - modeling improve the performance of the original LLMs?** - The researchers proposed a data - selection strategy based on reverse - loss differences and experimentally proved that using high - quality reverse - modeling data for continued pre - training can significantly improve the performance of LLMs on multiple language - understanding benchmark tests. ### Main contributions 1. **It has been shown that LLMs can handle forward and reverse - modeling texts equally well when trained from scratch.** 2. **A data - selection method based on reverse - loss differences has been proposed, which can further improve the performance of LLMs.** ### Conclusion This study not only reveals the potential of LLMs in reverse - modeling tasks but also provides a new data - selection strategy that helps improve the performance of LLMs. However, the study also points out some limitations, such as the fact that simple text reversal may not fully simulate complex reverse - cognitive processes and the computational - resource challenges that reverse - modeling training may bring. Future research can further explore more complex methods to simulate reverse - modeling and consider more evaluation metrics.