Abstract:The advancement of Large Language Models (LLMs) for domain applications in fields such as materials science and engineering depends on the development of fine-tuning strategies that adapt models for specialized, technical capabilities. In this work, we explore the effects of Continued Pretraining (CPT), Supervised Fine-Tuning (SFT), and various preference-based optimization approaches, including Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO), on fine-tuned LLM performance. Our analysis shows how these strategies influence model outcomes and reveals that the merging of multiple fine-tuned models can lead to the emergence of capabilities that surpass the individual contributions of the parent models. We find that model merging leads to new functionalities that neither parent model could achieve alone, leading to improved performance in domain-specific assessments. Experiments with different model architectures are presented, including Llama 3.1 8B and Mistral 7B models, where similar behaviors are observed. Exploring whether the results hold also for much smaller models, we use a tiny LLM with 1.7 billion parameters and show that very small LLMs do not necessarily feature emergent capabilities under model merging, suggesting that model scaling may be a key component. In open-ended yet consistent chat conversations between a human and AI models, our assessment reveals detailed insights into how different model variants perform and show that the smallest model achieves a high intelligence score across key criteria including reasoning depth, creativity, clarity, and quantitative precision. Other experiments include the development of image generation prompts based on disparate biological material design concepts, to create new microstructures, architectural concepts, and urban design based on biological materials-inspired construction principles.

Structure-aware Domain Knowledge Injection for Large Language Models

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities

Enhancing LLM's Cognition via Structurization

SA-MDKIF: A Scalable and Adaptable Medical Domain Knowledge Injection Framework for Large Language Models

KnowTuning: Knowledge-aware Fine-tuning for Large Language Models

Supervised Knowledge Makes Large Language Models Better In-context Learners

Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning

LMTuner: An user-friendly and highly-integrable Training Framework for fine-tuning Large Language Models

Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning

Struct-X: Enhancing Large Language Models Reasoning with Structured Data

Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese

VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs

StructGPT: A General Framework for Large Language Model to Reason over Structured Data

Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach

Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models

Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration

Leveraging Unstructured Text Data for Federated Instruction Tuning of Large Language Models

StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation