Abstract:The advancement of Large Language Models (LLMs) for domain applications in fields such as materials science and engineering depends on the development of fine-tuning strategies that adapt models for specialized, technical capabilities. In this work, we explore the effects of Continued Pretraining (CPT), Supervised Fine-Tuning (SFT), and various preference-based optimization approaches, including Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO), on fine-tuned LLM performance. Our analysis shows how these strategies influence model outcomes and reveals that the merging of multiple fine-tuned models can lead to the emergence of capabilities that surpass the individual contributions of the parent models. We find that model merging leads to new functionalities that neither parent model could achieve alone, leading to improved performance in domain-specific assessments. Experiments with different model architectures are presented, including Llama 3.1 8B and Mistral 7B models, where similar behaviors are observed. Exploring whether the results hold also for much smaller models, we use a tiny LLM with 1.7 billion parameters and show that very small LLMs do not necessarily feature emergent capabilities under model merging, suggesting that model scaling may be a key component. In open-ended yet consistent chat conversations between a human and AI models, our assessment reveals detailed insights into how different model variants perform and show that the smallest model achieves a high intelligence score across key criteria including reasoning depth, creativity, clarity, and quantitative precision. Other experiments include the development of image generation prompts based on disparate biological material design concepts, to create new microstructures, architectural concepts, and urban design based on biological materials-inspired construction principles.

When Does Further Pre-training MLM Help? an Empirical Study on Task-Oriented Dialog Pre-training

Different Strokes for Different Folks: Investigating Appropriate Further Pre-training Approaches for Diverse Dialogue Tasks

To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks

Investigating the Impact of Pre-trained Language Models on Dialog Evaluation

How Does Pretraining Improve Discourse-Aware Translation?

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities

Enhancing Cross-domain Pre-Trained Decision Transformers with Adaptive Attention

Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI

Pretraining Methods for Dialog Context Representation Learning

OmniDialog: An Omnipotent Pre-training Model for Task-Oriented Dialogue System

Meta Distant Transfer Learning for Pre-trained Language Models.

A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models

Learning Better Masking for Better Language Model Pre-training

Recent Advances in Pre-trained Language Models: Why Do They Work and How Do They Work

Re3Dial: Retrieve, Reorganize and Rescale Conversations for Long-Turn Open-Domain Dialogue Pre-training

Fine-Tuning Pretrained Language Models to Enhance Dialogue Summarization in Customer Service Centers

MPNet: Masked and Permuted Pre-training for Language Understanding

Multi-Stage Pre-training Enhanced by ChatGPT for Multi-Scenario Multi-Domain Dialogue Summarization

D4: Improving LLM Pretraining via Document De-Duplication and Diversification