GPT-4 as Evaluator: Evaluating Large Language Models on Pest Management in Agriculture

Shanglong Yang,Zhipeng Yuan,Shunbao Li,Ruoling Peng,Kang Liu,Po Yang

2024-03-18

Abstract:In the rapidly evolving field of artificial intelligence (AI), the application of large language models (LLMs) in agriculture, particularly in pest management, remains nascent. We aimed to prove the feasibility by evaluating the content of the pest management advice generated by LLMs, including the Generative Pre-trained Transformer (GPT) series from OpenAI and the FLAN series from Google. Considering the context-specific properties of agricultural advice, automatically measuring or quantifying the quality of text generated by LLMs becomes a significant challenge. We proposed an innovative approach, using GPT-4 as an evaluator, to score the generated content on Coherence, Logical Consistency, Fluency, Relevance, Comprehensibility, and Exhaustiveness. Additionally, we integrated an expert system based on crop threshold data as a baseline to obtain scores for Factual Accuracy on whether pests found in crop fields should take management action. Each model's score was weighted by percentage to obtain a final score. The results showed that GPT-3.4 and GPT-4 outperform the FLAN models in most evaluation categories. Furthermore, the use of instruction-based prompting containing domain-specific knowledge proved the feasibility of LLMs as an effective tool in agriculture, with an accuracy rate of 72%, demonstrating LLMs' effectiveness in providing pest management suggestions.

Computation and Language

What problem does this paper attempt to address?

The paper primarily aims to explore the potential application of large language models (LLMs) in the field of agricultural pest management and proposes an innovative method to evaluate the quality of pest management recommendations generated by these models. Specifically, the paper addresses the following key issues: 1. **Feasibility Verification**: Demonstrates the feasibility of LLMs in providing recommendations for agricultural pest management. 2. **Innovative Evaluation Method**: Introduces a novel method using GPT-4 as an evaluation tool to score the generated content on multiple dimensions such as coherence, logical consistency, fluency, relevance, comprehensibility, and thoroughness. 3. **Effectiveness of Instructional Prompting Techniques**: Shows that by applying instructional prompting techniques containing domain-specific knowledge, LLMs can provide effective pest management decisions with an accuracy rate of 72%. 4. **Model Differences**: Reveals the subtle differences between GPT-3.5 and GPT-4 in pest management decision-making, emphasizing the importance of selecting the appropriate model in agricultural environments. To achieve the above objectives, the researchers employed several large language models, including the GPT series (such as GPT-3.5 and GPT-4) and the FLAN series, and designed a series of experiments to test the performance of these models. They also constructed an expert system based on crop threshold data as a baseline to evaluate factual accuracy. Additionally, the study compared different prompting techniques, including zero-shot prompting, few-shot prompting, instruction-based prompting, and self-consistency prompting, to optimize model performance and enhance its practicality in agricultural pest management scenarios.

GPT-4 as Evaluator: Evaluating Large Language Models on Pest Management in Agriculture

GPT-4 as an Agronomist Assistant? Answering Agriculture Exams Using Large Language Models

ShizishanGPT: An Agricultural Large Language Model Integrating Tools and Resources

Enhancing Agricultural Machinery Management through Advanced LLM Integration

Large language models help facilitate the automated synthesis of information on potential pest controllers

Toward a long-range map of human chromosomal band 22q11.

From Text to Insight: Leveraging Large Language Models for Performance Evaluation in Management

LLMs for Enhanced Agricultural Meteorological Recommendations

On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications

Assessing the Effectiveness of GPT-4o in Climate Change Evidence Synthesis and Systematic Assessments: Preliminary Insights

Large Language Models and Foundation Models in Smart Agriculture: Basics, Opportunities, and Challenges

AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning

Harnessing Large Vision and Language Models in Agriculture: A Review

Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks

Editorial: Collaborative computing and applications

Exploring Large Language Models for Climate Forecasting

RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture

Applying Large Language Models and Chain-of-Thought for Automatic Scoring

Large Language Model in Medical Information Extraction from Titles and Abstracts with Prompt Engineering Strategies: A Comparative Study of GPT-3.5 and GPT-4

Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts