Abstract:Humans are accustomed to reading and writing in a forward manner, and this natural bias extends to text understanding in auto-regressive large language models (LLMs). This paper investigates whether LLMs, like humans, struggle with reverse modeling, specifically with reversed text inputs. We found that publicly available pre-trained LLMs cannot understand such inputs. However, LLMs trained from scratch with both forward and reverse texts can understand them equally well during inference. Our case study shows that different-content texts result in different losses if input (to LLMs) in different directions -- some get lower losses for forward while some for reverse. This leads us to a simple and nice solution for data selection based on the loss differences between forward and reverse directions. Using our selected data in continued pretraining can boost LLMs' performance by a large margin across different language understanding benchmarks.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper explores the performance of large - language models (LLMs) in handling reverse - modeling tasks. Specifically, the researchers simulate reverse - modeling by reversing text sequences, aiming to answer the following main questions: 1. **Can LLMs perform reverse - modeling?** - The researchers explore this question through two pre - training methods: continued training from a pre - trained model and from - scratch training. The results show that in the case of from - scratch training, LLMs can handle both forward and reverse input texts equally well, while in the case of continued training, LLMs perform poorly on reverse - input. 2. **Does the data domain affect the reverse - modeling ability of LLMs?** - The researchers used data sets covering multiple domains (such as books, academic papers, web - scraped texts, etc.) and found that texts in different domains perform differently in reverse - modeling. In particular, high - quality texts (such as books and academic papers) perform better in reverse - modeling. 3. **What features make texts easier to process in the reverse direction?** - By analyzing the loss differences of texts, the researchers found that texts with clear structures and logical coherence perform better in reverse - modeling. These texts are usually of high quality and can simplify the reverse - prediction task. 4. **Can reverse - modeling improve the performance of the original LLMs?** - The researchers proposed a data - selection strategy based on reverse - loss differences and experimentally proved that using high - quality reverse - modeling data for continued pre - training can significantly improve the performance of LLMs on multiple language - understanding benchmark tests. ### Main contributions 1. **It has been shown that LLMs can handle forward and reverse - modeling texts equally well when trained from scratch.** 2. **A data - selection method based on reverse - loss differences has been proposed, which can further improve the performance of LLMs.** ### Conclusion This study not only reveals the potential of LLMs in reverse - modeling tasks but also provides a new data - selection strategy that helps improve the performance of LLMs. However, the study also points out some limitations, such as the fact that simple text reversal may not fully simulate complex reverse - cognitive processes and the computational - resource challenges that reverse - modeling training may bring. Future research can further explore more complex methods to simulate reverse - modeling and consider more evaluation metrics.

Reverse Modeling in Large Language Models

Reverse Training to Nurse the Reversal Curse

Delving into the Reversal Curse: How Far Can Large Language Models Generalize?

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics

An Analysis and Mitigation of the Reversal Curse

Reverse Thinking Makes LLMs Stronger Reasoners

Evaluating and Mitigating Linguistic Discrimination in Large Language Models

Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training

Do Large Language Models Mirror Cognitive Language Processing?

Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models?

Formality is Favored: Unraveling the Learning Preferences of Large Language Models on Data with Conflicting Knowledge

Untying the Reversal Curse via Bidirectional Language Model Editing

Re-Thinking Inverse Graphics With Large Language Models

Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration

Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up

Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding?

An Investigation of LLMs' Inefficacy in Understanding Converse Relations

How do Large Language Models Handle Multilingualism?

Serial Position Effects of Large Language Models

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models