Abstract:Deep learning models struggle with compositional generalization, i.e. the ability to recognize or generate novel combinations of observed elementary concepts. In hopes of enabling compositional generalization, various unsupervised learning algorithms have been proposed with inductive biases that aim to induce compositional structure in learned representations (e.g. disentangled representation and emergent language learning). In this work, we evaluate these unsupervised learning algorithms in terms of how well they enable compositional generalization. Specifically, our evaluation protocol focuses on whether or not it is easy to train a simple model on top of the learned representation that generalizes to new combinations of compositional factors. We systematically study three unsupervised representation learning algorithms - $\beta$-VAE, $\beta$-TCVAE, and emergent language (EL) autoencoders - on two datasets that allow directly testing compositional generalization. We find that directly using the bottleneck representation with simple models and few labels may lead to worse generalization than using representations from layers before or after the learned representation itself. In addition, we find that the previously proposed metrics for evaluating the levels of compositionality are not correlated with actual compositional generalization in our framework. Surprisingly, we find that increasing pressure to produce a disentangled representation produces representations with worse generalization, while representations from EL models show strong compositional generalization. Taken together, our results shed new light on the compositional generalization behavior of different unsupervised learning algorithms with a new setting to rigorously test this behavior, and suggest the potential benefits of delevoping EL learning algorithms for more generalizable representations.

Consistency Regularization Training for Compositional Generalization.

Compositional Substitutivity of Visual Reasoning for Visual Question Answering

On Compositional Generalization of Neural Machine Translation

Towards Understanding the Relationship between In-context Learning and Compositional Generalization

Maintaining Reasoning Consistency in Compositional Visual Question Answering

In-Context Compositional Generalization for Large Vision-Language Models

Compositional Generalization by Learning Analytical Expressions.

Learning to Compose Representations of Different Encoder Layers towards Improving Compositional Generalization

On compositional generalization of transformer-based neural machine translation

Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language

Improving Compositional Generalization Using Iterated Learning and Simplicial Embeddings

A Study of Compositional Generalization in Neural Models

Data Factors for Better Compositional Generalization

Incorporating Consistency Verification into Neural Data-to-Document Generation.

Incorporating Consistency Verification into Neural Data-to-Document Generation

Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language

Learning to generalize to new compositions in image understanding

Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both?

Learning Algebraic Recombination for Compositional Generalization

Out-of-distribution generalization via composition: a lens through induction heads in Transformers

Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations