Abstract:Tomato leaf diseases pose a significant challenge for tomato farmers, resulting in substantial reductions in crop productivity. The timely and precise identification of tomato leaf diseases is crucial for successfully implementing disease management strategies. This paper introduces a transformer-based model called TomFormer for the purpose of tomato leaf disease detection. The paper's primary contributions include the following: Firstly, we present a novel approach for detecting tomato leaf diseases by employing a fusion model that combines a visual transformer and a convolutional neural network. Secondly, we aim to apply our proposed methodology to the Hello Stretch robot to achieve real-time diagnosis of tomato leaf diseases. Thirdly, we assessed our method by comparing it to models like YOLOS, DETR, ViT, and Swin, demonstrating its ability to achieve state-of-the-art outcomes. For the purpose of the experiment, we used three datasets of tomato leaf diseases, namely KUTomaDATA, PlantDoc, and PlanVillage, where KUTomaDATA is being collected from a greenhouse in Abu Dhabi, UAE. Finally, we present a comprehensive analysis of the performance of our model and thoroughly discuss the limitations inherent in our approach. TomFormer performed well on the KUTomaDATA, PlantDoc, and PlantVillage datasets, with mean average accuracy (mAP) scores of 87%, 81%, and 83%, respectively. The comparative results in terms of mAP demonstrate that our method exhibits robustness, accuracy, efficiency, and scalability. Furthermore, it can be readily adapted to new datasets. We are confident that our work holds the potential to significantly influence the tomato industry by effectively mitigating crop losses and enhancing crop yields.
Image and Video Processing,Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges in tomato leaf disease detection. Specifically, tomato leaf diseases pose a significant challenge to tomato growers, leading to a substantial decline in crop yields. Timely and accurate identification of tomato leaf diseases is crucial for the successful implementation of disease management strategies. Therefore, this research aims to develop a Transformer - based model, TomFormer, for the detection of tomato leaf diseases. The main contributions of the study include:
1. **Proposing a new method**: By combining Visual Transformer and Convolutional Neural Network (CNN), a fusion model is constructed to detect tomato leaf diseases.
2. **Application to Hello Stretch robot**: The proposed model is applied to the Hello Stretch robot to achieve real - time diagnosis of tomato leaf diseases.
3. **Performance evaluation**: By comparing with existing models such as YOLOS, DETR, ViT and Swin, its advanced performance on three tomato leaf disease datasets (KUTomaDATA, PlantDoc and PlantVillage) is demonstrated.
4. **Comprehensive analysis**: The performance of the model is analyzed in detail, and the limitations in the method are discussed.
### Background of the paper
Tomato (scientific name: Solanum lycopersicum) is a widely - planted crop, but plant diseases and pests can significantly reduce crop yields, resulting in economic losses and social impacts. Traditional disease identification methods are time - consuming and costly, and farmers usually rely on communication with other farmers or agricultural professionals to identify diseases. Therefore, farmers need automated AI image solutions to improve identification efficiency.
### Related work
Computer vision technology has developed rapidly in recent years, especially in image and video analysis. Many artificial intelligence methods have been used for plant disease identification, including k - Nearest Neighbor algorithm (K - NN), Logistic Regression (LR), Decision Trees (DTs), Support Vector Machines (SVMs) and Deep Convolutional Neural Networks (DCNNs). However, these methods still have deficiencies in terms of model efficiency. In recent years, Transformer Networks have performed excellently in natural language processing tasks and are now also widely used in computer vision tasks, such as image classification and object detection.
### Proposed method
The TomFormer model combines the advantages of Vision Transformer (ViT) and DEtection TRansformer (DETR) and proposes a new object detection method. Specifically:
1. **Feature extraction**: Use the Transformer network to extract features from tomato leaf images. The Transformer network can discover long - distance dependency relationships in data sequences, which is very important for identifying symptoms of different diseases in different parts of the leaf.
2. **Image acquisition**: Use the Hello Stretch robot to collect images of tomato leaves. This robot is equipped with a depth camera and can capture high - quality images.
3. **Model structure**:
- **Image processing head**: Remove the cls token in ViT and introduce N learnable object queries to enhance the object detection ability of the model.
- **TomFormer encoder**: Combine position embedding and convolutional features, and perform feature extraction through Multi - Head Self - Attention (MSA) blocks and Multi - Layer Perception (MLP) blocks.
- **Feed - forward network**: Use a single Feed - Forward Network (FFN) for classification and bounding box regression tasks.
### Experimental results
The research conducted experiments on three datasets (KUTomaDATA, PlantDoc and PlantVillage), and the results show that the mean Average Precision (mAP) of TomFormer on these datasets is 87%, 81% and 83% respectively, performing better than other models. The specific results are shown in the following table:
| Category |