Abstract:We investigate how to elicit compositional generalization capabilities in large language models (LLMs). Compositional generalization empowers LLMs to solve complex problems by combining foundational skills, a critical reasoning ability akin to human intelligence. However, even the most advanced LLMs currently struggle with this form of reasoning. We examine this problem within the framework of in-context learning and find that demonstrating both foundational skills and compositional examples grounded in these skills within the same prompt context is crucial. We refer to this prompt structure as skills-in-context (SKiC). With as few as two exemplars, this in-context learning structure enables LLMs to tackle more challenging problems requiring innovative skill combinations, achieving near-perfect systematic generalization across a broad range of tasks. Intriguingly, SKiC also unlocks the latent potential of LLMs, allowing them to more actively utilize pre-existing internal skills acquired during earlier pretraining stages to solve complex reasoning problems. The SKiC structure is robust across different skill constructions and exemplar choices and demonstrates strong transferability to new tasks. Finally, inspired by our in-context learning study, we show that fine-tuning LLMs with SKiC-style data can elicit zero-shot weak-to-strong generalization, enabling the models to solve much harder problems directly with standard prompting.

What problem does this paper attempt to address?

This paper attempts to address the problem of the lack of ability of large - language models (LLMs) in compositional generalization. Specifically, although existing LLMs perform excellently in handling natural language processing (NLP) tasks, they still have difficulty in solving more complex new problems by combining existing basic skills. The core objective of the paper is to study how to use the in - context learning method to enable LLMs to perform better combinatorial reasoning, thereby solving complex and unseen problems. ### Main contributions of the paper 1. **Proposing the Skills - in - Context (SKiC) structure**: - **Definition and structure**: SKiC is a new prompt structure that includes three main parts: 1. **Basic skills**: Lists the basic skills required to solve complex tasks. 2. **Combination examples**: Shows specific examples of how to combine these basic skills to solve complex problems. 3. **Problem to be solved**: The actual problem that needs to be solved. - **Function**: By showing basic skills and their combination methods, SKiC helps LLMs explicitly link reasoning steps with basic skills in the context, thereby achieving stronger compositional generalization ability. 2. **Experimental verification**: - **Systematic generalization**: SKiC achieves near - perfect systematic generalization on a series of tasks, such as letter splicing, addition, multiplication, and dynamic programming tasks. - **Complex reasoning**: For tasks that need to call internal skills in pre - trained knowledge (such as GSM8K and MATH), SKiC also shows significant advantages. Even if the provided skills are incomplete, LLMs can effectively use internal skills for reasoning. 3. **Beyond in - context learning**: - **Fine - tuning effect**: Inspired by the SKiC structure, using SKiC - annotated data to fine - tune LLMs can further improve their generalization ability from simple to complex tasks, which is better than the traditional CoT method. 4. **Robustness and transferability**: - **Robustness**: SKiC has strong robustness to different skill construction and example selection. - **Transferability**: SKiC performs well in cross - task transfer. Even if the prompt originally designed for one task is applied to a new task, the performance is also better than that of traditional methods. ### Summary By introducing the Skills - in - Context (SKiC) prompt structure, this paper successfully addresses the challenges of LLMs in compositional generalization, enabling LLMs to combine basic skills more effectively in the in - context learning framework and solve complex problems. This method not only improves the generalization ability of the system but also shows that LLMs can more actively use the internal skills obtained in the pre - training stage, thereby achieving better performance on various tasks.

Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models

In-Context Compositional Generalization for Large Vision-Language Models

Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability

Can Models Learn Skill Composition from Examples?

Laying the Foundation First? Investigating the Generalization from Atomic Skills to Complex Reasoning Tasks

Compositional Task Representations for Large Language Models

Compositional Chain-of-Thought Prompting for Large Multimodal Models

Prompting Large Pre-trained Vision-Language Models For Compositional Concept Learning

Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning

Can LLM find the green circle? Investigation and Human-guided tool manipulation for compositional generalization

Supervised Knowledge Makes Large Language Models Better In-context Learners

Latent Skill Discovery for Chain-of-Thought Reasoning

Large Language Models Are In-Context Semantic Reasoners Rather Than Symbolic Reasoners

Metacognitive Prompting Improves Understanding in Large Language Models

Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions

Large Language Models are Good Multi-lingual Learners : When LLMs Meet Cross-lingual Prompts

Large Language Models are Contrastive Reasoners

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

The Mystery of Compositional Generalization in Graph-based Generative Commonsense Reasoning

Complementary Explanations for Effective In-Context Learning

Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models