Harnessing Dataset Cartography for Improved Compositional Generalization in Transformers

Osman Batur İnce,Tanin Zeraati,Semih Yagcioglu,Yadollah Yaghoobzadeh,Erkut Erdem,Aykut Erdem
2023-10-19
Abstract:Neural networks have revolutionized language modeling and excelled in various downstream tasks. However, the extent to which these models achieve compositional generalization comparable to human cognitive abilities remains a topic of debate. While existing approaches in the field have mainly focused on novel architectures and alternative learning paradigms, we introduce a pioneering method harnessing the power of dataset cartography (Swayamdipta et al., 2020). By strategically identifying a subset of compositional generalization data using this approach, we achieve a remarkable improvement in model accuracy, yielding enhancements of up to 10% on CFQ and COGS datasets. Notably, our technique incorporates dataset cartography as a curriculum learning criterion, eliminating the need for hyperparameter tuning while consistently achieving superior performance. Our findings highlight the untapped potential of dataset cartography in unleashing the full capabilities of compositional generalization within Transformer models. Our code is available at <a class="link-external link-https" href="https://github.com/cyberiada/cartography-for-compositionality" rel="external noopener nofollow">this https URL</a>.
Computation and Language
What problem does this paper attempt to address?
The paper primarily aims to address the following issues: ### Research Background and Objectives - **Research Background**: Despite significant progress in language modeling by neural networks and their excellent performance in various downstream tasks, whether they can achieve compositional generalization similar to human cognitive abilities remains a debated topic. - **Objectives**: The paper proposes an innovative approach to improve the performance of Transformer models on compositional generalization tasks by leveraging dataset cartography. ### Specific Issues - **Core Issue**: How to use dataset cartography to improve the performance of Transformer models on compositional generalization tasks. - **Technical Challenges**: - How to effectively select a subset of training samples in compositional generalization tasks to optimize model performance. - How to apply dataset cartography to sequence-to-sequence (seq2seq) generation tasks, especially when dealing with synthetic datasets. - Exploring the impact of different confidence measures (such as Inverse Perplexity, CHIA, and BLEU) on extracting dataset cartography and determining which measures are more suitable for compositional generalization settings. ### Method Overview - **Application of Dataset Cartography**: The authors utilize dataset cartography to construct curricula by observing the dynamic changes of each instance during the training process and selecting specific samples from the entire training set for training. - **Experimental Setup**: Synthetic datasets (such as CFQ and COGS) are used for experiments. These datasets do not have the issue of manually labeled data and are relatively small in scale, making it more challenging to achieve performance comparable to the full training set from smaller subsets. - **Contributions**: - Introduced dataset cartography as a new use for curriculum learning criteria and sample selection strategies to enhance compositional generalization ability. - Conducted an in-depth analysis of different confidence measures used for sequence tasks, providing insights into quantifying sequence difficulty and developing robust training strategies. - Demonstrated that improving training dynamics using dataset cartography significantly impacts the compositional generalization ability of Transformer models. ### Experimental Results - **Key Findings**: - Selecting "hard-to-learn" samples as a training subset can significantly improve model performance, even surpassing the results of training with the full dataset in some cases. - Inverse Perplexity as a confidence measure is more effective than CHIA or BLEU. - Combining subsets of "hard-to-learn" and "easy-to-learn" samples also showed good performance, but the "hard-to-learn" sample subset performed best across all measures. - Using dataset cartography for curriculum learning also helps improve model performance, especially when prioritizing training on "hard-to-learn" samples in the initial stages. In summary, the paper introduces innovative methods to improve the performance of Transformer models on compositional generalization tasks by leveraging dataset cartography, particularly in sample selection and curriculum learning strategies, and demonstrates their effectiveness through experiments.