Visual Large Language Model for Wheat Disease Diagnosis in the Wild

Kunpeng Zhang,Li Ma,Beibei Cui,Xin Li,Boqiang Zhang,Na Xie
DOI: https://doi.org/10.1016/j.compag.2024.109587
IF: 8.3
2024-01-01
Computers and Electronics in Agriculture
Abstract:Early detection of symptoms in wheat plants is crucial for mitigating disease effects and preventing their spread. Prompt phytosanitary treatment minimizes yield losses and enhances treatment efficacy. In recent years, numerous image analysis-based methodologies for automatic disease identification have been developed, with Convolutional Neural Networks (CNNs) achieving notable success in visual classification tasks. The existing methods often lack the necessary intelligence and reasoning for real-world applications. This study introduces an advanced wheat disease diagnosis approach using a Visual Language Model (VLM), named the Wheat Disease Language Model (WDLM). The WDLM first leverages the modified Segment Anything Model (SAM) to isolate key wheat features from complex wild environments. To enhance the logical reasoning abilities, the WDLM integrates a reasoning chain to generate clear, reasoned explanations for its diagnosis. By employing dedicated prompt engineering, this study establishes the Wheat Disease Semantic Dataset (WDSD) to fine-tune the VLM. The WDSD, which includes a diverse set of wheat images from various sources, bridges the gap between advanced VLM technology and wheat pathology. Tailored with task-specific data, the WDLM demonstrates superior intelligence by providing accurate classification of wheat diseases and suggesting potential treatment options. Compared to CNN-based models, Transformer-based models, and other VLMs, the WDLM shows improved performance in various scenarios. Integrated with mobile applications, the WDLM approach is readily applicable in the field, representing a promising advancement in the intelligent diagnosis of wheat diseases.
What problem does this paper attempt to address?