Weed Recognition Method based on Hybrid CNN-Transformer Model

Jun Zhang
DOI: https://doi.org/10.54097/fcis.v4i2.10209
2023-06-26
Abstract:As an important task in precision agriculture, weed recognition plays a crucial role in crop management and yield increase. However, achieving high accuracy and efficiency at the same time remains a challenge. To address the balance between accuracy and timeliness in weed recognition, this paper proposes a hybrid CNN-Transformer model for weed recognition. The model uses a combination of convolutional neural network (CNN) and Transformer structures for feature extraction and classification, taking into account both global and local information. In addition, the proposed Transformer Block incorporates the SDTA (Segmentation Depth Transpose Attention) mechanism to improve timeliness. Furthermore, this paper improves the original ViT model to enhance its accuracy. Experimental results on the Deep Weeds dataset by Olsen et al. show that the proposed hybrid model outperforms the original Vision Transformer model in weed recognition accuracy (89.43% vs. 96.08%). This research provides an effective solution for weed recognition using a hybrid model, with high practical value and application prospects.
Engineering,Agricultural and Food Sciences,Computer Science
What problem does this paper attempt to address?
The paper attempts to address the problem of efficiently and accurately identifying weeds in precision agriculture. Specifically, the paper proposes a method based on a hybrid CNN-Transformer model to balance the accuracy and real-time performance of existing weed identification technologies. This method aims to improve the accuracy and efficiency of weed identification by combining the advantages of Convolutional Neural Networks (CNN) and Transformer structures, thereby extracting both local and global information. Additionally, the paper introduces the SDTA (Segmentation Depth Transpose Attention) mechanism to further enhance the model's real-time performance and improves the original ViT model to boost its accuracy. Experimental results show that the proposed hybrid model outperforms the original Vision Transformer model on the Deep Weeds dataset, with accuracy increasing from 89.43% to 96.08%, providing an efficient and practical solution for weed identification.