Abstract:The advancement of Large Language Models (LLMs) for domain applications in fields such as materials science and engineering depends on the development of fine-tuning strategies that adapt models for specialized, technical capabilities. In this work, we explore the effects of Continued Pretraining (CPT), Supervised Fine-Tuning (SFT), and various preference-based optimization approaches, including Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO), on fine-tuned LLM performance. Our analysis shows how these strategies influence model outcomes and reveals that the merging of multiple fine-tuned models can lead to the emergence of capabilities that surpass the individual contributions of the parent models. We find that model merging leads to new functionalities that neither parent model could achieve alone, leading to improved performance in domain-specific assessments. Experiments with different model architectures are presented, including Llama 3.1 8B and Mistral 7B models, where similar behaviors are observed. Exploring whether the results hold also for much smaller models, we use a tiny LLM with 1.7 billion parameters and show that very small LLMs do not necessarily feature emergent capabilities under model merging, suggesting that model scaling may be a key component. In open-ended yet consistent chat conversations between a human and AI models, our assessment reveals detailed insights into how different model variants perform and show that the smallest model achieves a high intelligence score across key criteria including reasoning depth, creativity, clarity, and quantitative precision. Other experiments include the development of image generation prompts based on disparate biological material design concepts, to create new microstructures, architectural concepts, and urban design based on biological materials-inspired construction principles.

On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

Methodology of Adapting Large English Language Models for Specific Cultural Contexts

Transforming and Combining Rewards for Aligning Large Language Models

Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass

LongReward: Improving Long-context Large Language Models with AI Feedback

Self-Updatable Large Language Models with Parameter Integration

Context-faithful Prompting for Large Language Models

Adapting LLMs for Efficient Context Processing through Soft Prompt Compression

Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs

Enhancing Function-Calling Capabilities in LLMs: Strategies for Prompt Formats, Data Integration, and Multilingual Translation

Role Prompting Guided Domain Adaptation with General Capability Preserve for Large Language Models

Empower Your Model with Longer and Better Context Comprehension

Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

Supervised Knowledge Makes Large Language Models Better In-context Learners

Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

Efficiently Adapting Pretrained Language Models To New Languages

Introspective Tips: Large Language Model for In-Context Decision Making

Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition