Abstract:The advancement of Large Language Models (LLMs) for domain applications in fields such as materials science and engineering depends on the development of fine-tuning strategies that adapt models for specialized, technical capabilities. In this work, we explore the effects of Continued Pretraining (CPT), Supervised Fine-Tuning (SFT), and various preference-based optimization approaches, including Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO), on fine-tuned LLM performance. Our analysis shows how these strategies influence model outcomes and reveals that the merging of multiple fine-tuned models can lead to the emergence of capabilities that surpass the individual contributions of the parent models. We find that model merging leads to new functionalities that neither parent model could achieve alone, leading to improved performance in domain-specific assessments. Experiments with different model architectures are presented, including Llama 3.1 8B and Mistral 7B models, where similar behaviors are observed. Exploring whether the results hold also for much smaller models, we use a tiny LLM with 1.7 billion parameters and show that very small LLMs do not necessarily feature emergent capabilities under model merging, suggesting that model scaling may be a key component. In open-ended yet consistent chat conversations between a human and AI models, our assessment reveals detailed insights into how different model variants perform and show that the smallest model achieves a high intelligence score across key criteria including reasoning depth, creativity, clarity, and quantitative precision. Other experiments include the development of image generation prompts based on disparate biological material design concepts, to create new microstructures, architectural concepts, and urban design based on biological materials-inspired construction principles.

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper primarily explores how to enhance the application capabilities of Large Language Models (LLMs) in specific domains through fine-tuning strategies. Specifically, the researchers investigate the following methods: 1. **Continued Pre-Training (CPT)**: - Continuing to train the model on domain-specific data after the initial pre-training to enhance the model's knowledge in that domain. 2. **Supervised Fine-Tuning (SFT)**: - Fine-tuning the pre-trained model using labeled datasets to better perform specific tasks such as question answering, reasoning, etc. 3. **Preference Optimization Methods**: - Including Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO), optimizing model behavior by learning directly from preferences or feedback. 4. **Model Merging**: - Combining multiple models from different training stages into a new model, utilizing Spherical Linear Interpolation (SLERP) to achieve non-linear parameter interactions, thereby generating new functionalities. The main findings of the paper include: - Model merging is not just a simple aggregation process but a method that can significantly enhance model performance, especially when combined with optimization strategies like DPO and ORPO. - Spherical Linear Interpolation (SLERP) performs well in the model merging process, preserving the geometric relationships in the model parameter space and revealing new interaction features. - Models of different scales perform differently after merging; smaller models may not exhibit similar emergent capabilities, indicating that model scale might be a key factor. Through the systematic exploration of these methods, the researchers aim to develop LLMs that are better suited to the challenges of complex domains and provide valuable insights and directions for future research.

Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities

How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition

An Emulator for Fine-Tuning Large Language Models using Small Language Models

Enhancing Large Language Model Performance To Answer Questions and Extract Information More Accurately

I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses

Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models

Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning

Structure-aware Domain Knowledge Injection for Large Language Models

The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities

A Fine-Tuned Large Language Model for Domain-Specific with Reinforcement Learning

Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs

Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model

The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities

Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model

Large Language Models with Controllable Working Memory

Fine-tuning Large Language Models for Entity Matching

CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models

Dial-insight: Fine-tuning Large Language Models with High-Quality Domain-Specific Data Preventing Capability Collapse

Fine-Tuning Large Language Models in Education

Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning