Ten simple rules for using large language models in science, version 1.0
Gabriel Reuben Smith,Carolina Bello,Lalasia Bialic-Murphy,Emily Clark,Camille S. Delavaux,Camille Fournier de Lauriere,Johan van den Hoogen,Thomas Lauber,Haozhi Ma,Daniel S. Maynard,Matthew Mirman,Lidong Mo,Dominic Rebindaine,Josephine Elena Reek,Leland K. Werden,Zhaofei Wu,Gayoung Yang,Qingzhou Zhao,Constantin M. Zohner,Thomas W. Crowther
DOI: https://doi.org/10.1371/journal.pcbi.1011767
2024-02-01
PLoS Computational Biology
Abstract:Generative artificial intelligence (AI) tools, including large language models (LLMs), are expected to radically alter the way we live and work, with as many as 300 million jobs at risk [1]. Arguably the most well-known LLM currently is GPT (generative pre-trained transformer), developed by American company OpenAI [2]. Since its release in late 2022, GPT's chatbot interface, ChatGPT, has exploded in popularity, setting a new record for the fastest growing user base in history [3]. The appeal of GPT and other LLMs stem from their ability to effectively carry out multistep tasks and provide clear, human-like responses to complicated queries and prompts (Box 1). Unsurprisingly, this capacity is catching the eye of scientists [4]. Indeed, there is increasing interest in using GPT and other LLMs to accelerate scientific progress for the benefit of humankind [5]. However, specific challenges concerning possible misuse of LLMs in science are arising [6] in tandem with broader concerns about potential societal disruption and ethical risks [7,8]. As such, there is an urgent need for the scientific community to establish general guiding principles for the appropriate use of LLMs and other generative AI tools to maximise benefit and minimise harm [9,10]. Here, we propose a set of 10 simple rules for using LLMs in science, drawn from our own experimentation as cautiously optimistic environmental scientists curious about novel tools to streamline research. We note that the list is grounded in our expertise as scientists and experience as end-users of LLMs (GPT specifically), not as AI developers. We also note that we do not here address other sorts of generative AI, which could also be increasingly used for scientific research in the future. We suggest safeguards against 5 areas of concern to be wary of ( Rules 1 to 5 ), complemented by suggestions for areas where LLMs have potential to support scientific research if sufficient care is taken to avoid issues ( Rules 6 to 10 ). Since LLMs are predictive language models, our use suggestions focus on language-centric aspects of scientific research, such as computer coding, writing, and publishing. As developments in this field are rapid and outcomes often unpredictable [11], we envision that these guidelines can provide a starting point, not an end point; they will likely need to be revisited and adapted as circumstances change. We envision, additionally, that our list may also provide a basis for better standardised reporting and documentation (S1 Appendix) usable across journals, allowing researchers who are submitting manuscripts to document their use(s) of LLMs and affirm that they have appropriately considered potential problem areas.
biochemical research methods,mathematical & computational biology