ShizishanGPT: An Agricultural Large Language Model Integrating Tools and Resources

Shuting Yang,Zehui Liu,Wolfgang Mayer
2024-09-20
Abstract:Recent developments in large language models (LLMs) have led to significant improvements in intelligent dialogue systems'ability to handle complex inquiries. However, current LLMs still exhibit limitations in specialized domain knowledge, particularly in technical fields such as agriculture. To address this problem, we propose ShizishanGPT, an intelligent question answering system for agriculture based on the Retrieval Augmented Generation (RAG) framework and agent architecture. ShizishanGPT consists of five key modules: including a generic GPT-4 based module for answering general questions; a search engine module that compensates for the problem that the large language model's own knowledge cannot be updated in a timely manner; an agricultural knowledge graph module for providing domain facts; a retrieval module which uses RAG to supplement domain knowledge; and an agricultural agent module, which invokes specialized models for crop phenotype prediction, gene expression analysis, and so on. We evaluated the ShizishanGPT using a dataset containing 100 agricultural questions specially designed for this study. The experimental results show that the tool significantly outperforms general LLMs as it provides more accurate and detailed answers due to its modular design and integration of different domain knowledge sources. Our source code, dataset, and model weights are publicly available at <a class="link-external link-https" href="https://github.com/Zaiwen/CropGPT" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that current large - language models (LLMs) have limitations in the application of specific - domain knowledge, especially in the agricultural field. Although existing large - language models perform well in handling complex queries, their performance in professional fields such as agriculture is still insufficient, especially in using advanced genetic tools and retrieving crop - related knowledge. This limits the application effectiveness of these models in precision agriculture and crop management. To address this challenge, the paper proposes an intelligent question - answering system, ShizishanGPT, based on the Retrieval - Augmented Generation (RAG) framework and the agent architecture. This system aims to improve the application ability of large - language models in the agricultural field by integrating knowledge sources from different domains and providing more accurate and detailed answers. Specifically, ShizishanGPT consists of five key modules: 1. **General GPT - 4 module**: Used to answer general questions. 2. **Search engine module**: Compensates for the problem that the knowledge of large - language models cannot be updated in a timely manner. 3. **Agricultural knowledge graph module**: Provides domain facts. 4. **Retrieval module**: Uses RAG to supplement domain knowledge. 5. **Agricultural agent module**: Invokes specialized models to perform tasks such as crop phenotyping prediction and gene expression analysis. The paper evaluates the performance of ShizishanGPT by constructing a dataset containing 100 agricultural questions and compares it with general large - language models. The experimental results show that ShizishanGPT is significantly superior to other models in evaluation metrics such as BLEU, ROUGE, and GLEU, especially in terms of similarity and semantic consistency. This proves that ShizishanGPT has higher accuracy and reliability in intelligent question - answering in the agricultural field.