Abstract:Recent diagnostic datasets on compositional generalization, such as SCAN (Lake and Baroni, 2018) and COGS (Kim and Linzen, 2020), expose severe problems in models trained from scratch on these datasets. However, in contrast to this poor performance, state-of-the-art models trained on larger and more general datasets show better generalization ability. In this work, to reconcile this inconsistency, we conduct an empirical analysis by training Transformer models on a variety of training sets with different data factors, including dataset scale, pattern complexity, example difficulty, etc. First, we show that increased dataset complexity can lead to better generalization behavior on multiple different generalization challenges. To further understand this improvement, we show two axes of the benefit from more complex datasets: they provide more diverse examples so compositional understanding becomes more effective, and they also prevent ungeneralizable memorization of the examples due to reduced example repetition frequency. Finally, we explore how training examples of different difficulty levels influence generalization differently. On synthetic datasets, simple examples invoke stronger compositionality than hard examples do. On larger-scale real language datasets, while hard examples become more important potentially to ensure decent data coverage, a balanced mixture of simple and hard examples manages to induce the strongest generalizability. The code and data for this work are available at <a class="link-external link-https" href="https://github.com/owenzx/data4comp" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the deficiency of neural sequence - to - sequence (seq2seq) models in compositional generalization. Specifically, when these models are trained from scratch and encounter examples of new combinations containing seen elements during testing, they perform very poorly. However, in contrast to these results, models trained or pre - trained on larger and more general datasets show better compositional generalization ability. Therefore, this paper empirically analyzes how data factors (such as dataset scale, pattern complexity, example difficulty, etc.) affect the generalization ability of Transformer models trained from scratch, in order to explain why more complex datasets can improve the model's compositional generalization performance. ### Main contributions of the paper 1. **Relationship between dataset complexity and generalization ability**: The study found that increasing the complexity of the dataset can significantly improve the model's performance in various generalization challenges. This is mainly because more complex datasets provide more diverse examples, making combinatorial understanding more effective and reducing the frequency of repeated memory of examples, thereby preventing non - generalizable memory. 2. **Analysis of the advantages of complex datasets**: The author proposes two hypotheses to explain why more complex datasets can improve generalization: - **Diversity hypothesis**: More unique patterns in the dataset (for example, more unique original words) increase the difficulty of surface memory. - **Frequency hypothesis**: Larger datasets lead to a lower frequency of seeing similar examples, thereby preventing them from being memorized. 3. **The impact of training examples of different difficulties on generalization**: The study shows that on synthetic datasets, simple examples can promote compositional generalization more than difficult examples. On large - scale real - language datasets, although simple examples alone are not sufficient to achieve good performance, a mixture of simple and difficult examples can induce the strongest generalization ability. ### Experimental design and results - **Experimental design**: The author conducted experiments on multiple datasets, including the synthetic dataset SCAN and its extended version SCAN*, as well as the real - language datasets GeoQuery, ATIS, and SMCalFlow. By controlling the complexity of the dataset and the difficulty of examples, the generalization performance of the model was observed. - **Results**: The experimental results show that increasing the complexity of the dataset significantly improves the model's generalization ability, especially in compositional generalization tasks. In addition, by reducing frequently occurring examples, the generalization performance can be further improved. ### Conclusion This paper empirically reveals the positive impact of dataset complexity on the model's compositional generalization ability through empirical research, and proposes a simple data augmentation method AugZero, which can also effectively improve the model's generalization ability without introducing additional knowledge. These findings are of great significance for understanding and improving model generalization in natural language processing tasks.

Data Factors for Better Compositional Generalization

Harnessing Dataset Cartography for Improved Compositional Generalization in Transformers

Towards Understanding the Relationship between In-context Learning and Compositional Generalization

Improving Compositional Generalization Using Iterated Learning and Simplicial Embeddings

Unobserved Local Structures Make Compositional Generalization Hard

Improving Compositional Generalization in Math Word Problem Solving

Compositional Generalization for Multi-label Text Classification: A Data-Augmentation Approach

Enhancing Compositional Generalization via Compositional Feature Alignment

Learning to Compose Representations of Different Encoder Layers towards Improving Compositional Generalization

Consistency Regularization Training for Compositional Generalization.

How Do In-Context Examples Affect Compositional Generalization?

Compositional Generalization for Data-to-Text Generation

Compositional Generalization by Learning Analytical Expressions.

Out-of-distribution generalization via composition: a lens through induction heads in Transformers

On compositional generalization of transformer-based neural machine translation

Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

When does compositional structure yield compositional generalization? A kernel theory

Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language

On Compositional Generalization of Neural Machine Translation

A General Theory for Compositional Generalization

CTL++: Evaluating Generalization on Never-Seen Compositional Patterns of Known Functions, and Compatibility of Neural Representations