Abstract:Objectives: Generative large language models (LLMs) are a subset of transformers-based neural network architecture models. LLMs have successfully leveraged a combination of an increased number of parameters, improvements in computational efficiency, and large pre-training datasets to perform a wide spectrum of natural language processing (NLP) tasks. Using a few examples (few-shot) or no examples (zero-shot) for prompt-tuning has enabled LLMs to achieve state-of-the-art performance in a broad range of NLP applications. This article by the American Medical Informatics Association (AMIA) NLP Working Group characterizes the opportunities, challenges, and best practices for our community to leverage and advance the integration of LLMs in downstream NLP applications effectively. This can be accomplished through a variety of approaches, including augmented prompting, instruction prompt tuning, and reinforcement learning from human feedback (RLHF). Target audience: Our focus is on making LLMs accessible to the broader biomedical informatics community, including clinicians and researchers who may be unfamiliar with NLP. Additionally, NLP practitioners may gain insight from the described best practices. Scope: We focus on 3 broad categories of NLP tasks, namely natural language understanding, natural language inferencing, and natural language generation. We review the emerging trends in prompt tuning, instruction fine-tuning, and evaluation metrics used for LLMs while drawing attention to several issues that impact biomedical NLP applications, including falsehoods in generated text (confabulation/hallucinations), toxicity, and dataset contamination leading to overfitting. We also review potential approaches to address some of these current challenges in LLMs, such as chain of thought prompting, and the phenomena of emergent capabilities observed in LLMs that can be leveraged to address complex NLP challenge in biomedical applications.

A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences

Harnessing Large Language Models in Medical Research and Scientific Writing: A Closer Look to The Future

Benchmarking Large Language Models in Evidence-Based Medicine

A Survey for Large Language Models in Biomedicine

Large language models in healthcare and medical domain: A review

Large Language Models, scientific knowledge and factuality: A framework to streamline human expert evaluation

Large Language Models in Biomedical and Health Informatics: A Review with Bibliometric Analysis

Large language models for biomedicine: foundations, opportunities, challenges, and best practices

Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges

Large Language Models in Medicine: The Potentials and Pitfalls

Large Language Models for Biomedical Knowledge Graph Construction: Information extraction from EMR notes

Large Language Models to process, analyze, and synthesize biomedical texts – a scoping review

Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation

Application Research of Large Language Models in Medicine: Status, Problems, and Future

The Application of Large Language Models in Gastroenterology: A Review of the Literature

Large Language Models for Medicine: A Survey

The transformative impact of large language models on medical writing and publishing: current applications, challenges and future directions

A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions

Large Language Models and Medical Knowledge Grounding for Diagnosis Prediction

Large language models in medical and healthcare fields: applications, advances, and challenges