Abstract:The advancement of Large Language Models (LLMs) for domain applications in fields such as materials science and engineering depends on the development of fine-tuning strategies that adapt models for specialized, technical capabilities. In this work, we explore the effects of Continued Pretraining (CPT), Supervised Fine-Tuning (SFT), and various preference-based optimization approaches, including Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO), on fine-tuned LLM performance. Our analysis shows how these strategies influence model outcomes and reveals that the merging of multiple fine-tuned models can lead to the emergence of capabilities that surpass the individual contributions of the parent models. We find that model merging leads to new functionalities that neither parent model could achieve alone, leading to improved performance in domain-specific assessments. Experiments with different model architectures are presented, including Llama 3.1 8B and Mistral 7B models, where similar behaviors are observed. Exploring whether the results hold also for much smaller models, we use a tiny LLM with 1.7 billion parameters and show that very small LLMs do not necessarily feature emergent capabilities under model merging, suggesting that model scaling may be a key component. In open-ended yet consistent chat conversations between a human and AI models, our assessment reveals detailed insights into how different model variants perform and show that the smallest model achieves a high intelligence score across key criteria including reasoning depth, creativity, clarity, and quantitative precision. Other experiments include the development of image generation prompts based on disparate biological material design concepts, to create new microstructures, architectural concepts, and urban design based on biological materials-inspired construction principles.

AgentRefine: Enhancing Agent Generalization through Refinement Tuning

AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback

AgentTuning: Enabling Generalized Agent Abilities for LLMs

Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models

Fine-grained LLM Agent: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback

CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models

AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation

Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement

Training Language Model Agents without Modifying Language Models

Iterative Experience Refinement of Software-Developing Agents

LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback

AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization

Unveiling the Generalization Power of Fine-Tuned Large Language Models

Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let's Take TravelPlanner as an Example

Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning

Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities

Learning to Refine with Fine-Grained Natural Language Feedback