A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

Yu Zhang,Xiusi Chen,Bowen Jin,Sheng Wang,Shuiwang Ji,Wei Wang,Jiawei Han

2024-09-29

Abstract:In many scientific fields, large language models (LLMs) have revolutionized the way text and other modalities of data (e.g., molecules and proteins) are handled, achieving superior performance in various applications and augmenting the scientific discovery process. Nevertheless, previous surveys on scientific LLMs often concentrate on one or two fields or a single modality. In this paper, we aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs regarding their architectures and pre-training techniques. To this end, we comprehensively survey over 260 scientific LLMs, discuss their commonalities and differences, as well as summarize pre-training datasets and evaluation tasks for each field and modality. Moreover, we investigate how LLMs have been deployed to benefit scientific discovery. Resources related to this survey are available at <a class="link-external link-https" href="https://github.com/yuzhimanhua/Awesome-Scientific-Language-Models" rel="external noopener nofollow">this https URL</a>.

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to provide a comprehensive perspective to examine large - language models (LLMs) in the scientific field and their applications in scientific research. Specifically, it aims to: 1. **Reveal cross - domain and cross - modal connections**: The paper emphasizes that existing surveys on scientific LLMs usually focus on one or two specific domains or a single modality, while this paper attempts to provide a more comprehensive view of the research landscape by showing the connections of architectures and pre - training techniques between different domains and modalities. 2. **Summarize and analyze scientific LLMs**: A comprehensive survey of more than 260 scientific LLMs was conducted, their commonalities and differences were discussed, and the pre - training datasets and evaluation tasks for each domain and modality were summarized. 3. **Explore the applications of LLMs in scientific discovery**: It studied how LLMs can be deployed to promote scientific discovery, including applications in hypothesis generation, theorem proving, experimental design, drug discovery, and weather forecasting. 4. **Provide resources and support**: Resources related to the survey were provided, such as the project link on GitHub (https://github.com/yuzhimanhua/Awesome - Scientific - Language - Models), so that readers can further understand and use these models. Through the above goals, the paper hopes to more accurately depict the connections between different scientific LLMs, show their commonalities, and possibly guide future design and development.

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

Scientific Large Language Models: A Survey on Biological & Chemical Domains

A Survey of Pre-trained Language Models for Processing Scientific Text

A Survey for Large Language Models in Biomedicine

An Interdisciplinary Outlook on Large Language Models for Scientific Research

Multilingual Large Language Models: A Systematic Survey

A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

A Survey on Evaluation of Large Language Models

A Survey on Evaluation of Large Language ModelsJust Accepted

A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine

A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness

Large Language Models for Time Series: A Survey

A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations

A Survey of Graph Meets Large Language Model: Progress and Future Directions

Towards Efficient Large Language Models for Scientific Text: A Review

History, Development, and Principles of Large Language Models-An Introductory Survey

Evaluating Large Language Models: A Comprehensive Survey

Large Language Models Meet NLP: A Survey

Efficient Large Language Models: A Survey

A Survey of Large Language Models in Medicine: Progress, Application, and Challenge