Abstract:The body of ecological literature, which informs much of our knowledge of the global loss of biodiversity, has been experiencing rapid growth in recent decades. The increasing difficulty to synthesise this literature manually has simultaneously resulted in a growing demand for automated text mining methods. Within the domain of deep learning, large language models (LLMs) have been the subject of considerable attention in recent years by virtue of great leaps in progress and a wide range of potential applications, however, quantitative investigation into their potential in ecology has so far been lacking. In this work, we analyse the ability of GPT-4 to extract information about invertebrate pests and pest controllers from abstracts of a body of literature on biological pest control, using a bespoke, zero-shot prompt. Our results show that the performance of GPT-4 is highly competitive with other state-of-the-art tools used for taxonomic named entity recognition and geographic location extraction tasks. On a held-out test set, we show that species and geographic locations are extracted with F1-scores of 99.8% and 95.3%, respectively, and highlight that the model is able to distinguish very effectively between the primary roles of interest (predators, parasitoids and pests). Moreover, we demonstrate the ability of the model to effectively extract and predict taxonomic information across various taxonomic ranks, and to automatically correct spelling mistakes. However, we do report a small number of cases of fabricated information (hallucinations). As a result of the current lack of specialised, pre-trained ecological language models, general-purpose LLMs may provide a promising way forward in ecology. Combined with tailored prompt engineering, such models can be employed for a wide range of text mining tasks in ecology, with the potential to greatly reduce time spent on manual screening and labelling of the literature.

Towards unearthing neglected climate innovations from scientific literature using Large Language Models

Enhancing Large Language Models with Climate Resources

Climate Change from Large Language Models

Exploring Large Language Models for Climate Forecasting

Assessing Large Language Models on Climate Information

Assessing the Effectiveness of GPT-4o in Climate Change Evidence Synthesis and Systematic Assessments: Preliminary Insights

Automated Fact-Checking of Climate Change Claims with Large Language Models

ClimateGPT: Towards AI Synthesizing Interdisciplinary Research on Climate Change

Opportunities and Challenges of Applying Large Language Models in Building Energy Efficiency and Decarbonization Studies: An Exploratory Overview

Double Jeopardy and Climate Impact in the Use of Large Language Models: Socio-economic Disparities and Reduced Utility for Non-English Speakers

Efficient Aspect-Based Summarization of Climate Change Reports with Small Language Models

An Interdisciplinary Outlook on Large Language Models for Scientific Research

Large language models overcome the challenges of unstructured text data in ecology

Can Large Language Models Unlock Novel Scientific Research Ideas?

Large language models help facilitate the automated synthesis of information on potential pest controllers

A Survey of Sustainability in Large Language Models: Applications, Economics, and Challenges

Using Large Language Models for a standard assessment mapping for sustainable communities

Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research

Using Large Language Models to Enhance the Reusability of Sensor Data

Large language models are changing landscape of academic publications. A positive transformation?