Abstract:Plant disease detection is a critical task in agriculture, essential for ensuring crop health and productivity. Traditional methods in this context are often labor-intensive and prone to errors, highlighting the need for automated solutions. While computer vision-based solutions have been successfully deployed in recent years for plant disease identification and localization tasks, these often operate independently, leading to suboptimal performance. It is essential to develop an integrated solution combining these two tasks for improved efficiency and accuracy. This research proposes the innovative Plant Disease Localization and Classification model based on Vision Transformer (PDLC-ViT), which integrates co-scale, co-attention, and cross-attention mechanisms and a ViT, within a Multi-Task Learning (MTL) framework. The model was trained and evaluated on the Plant Village dataset. Key hyperparameters, including learning rate, batch size, dropout ratio, and regularization factor, were optimized through a thorough grid search. Early stopping based on validation loss was employed to prevent overfitting. The PDLC-ViT model demonstrated significant improvements in plant disease localization and classification tasks. The integration of co-scale, co-attention, and cross-attention mechanisms allowed the model to capture multi-scale dependencies and enhance feature learning, leading to superior performance compared to existing models. The PDLC-ViT model evaluated on two public datasets achieved an accuracy of 99.97%, a Mean Average Precision (MAP) of 99.18%, and a Mean Average Recall (MAR) of 99.11%. These results underscore the model's exceptional precision and recall, highlighting its robustness and reliability in detecting and classifying plant diseases. The PDLC-ViT model sets a new benchmark in plant disease detection, offering a reliable and advanced tool for agricultural applications. Its ability to integrate localization and classification tasks within an MTL framework promotes timely and accurate disease management, contributing to sustainable agriculture and food security.

Multi-Label Plant Species Classification with Self-Supervised Vision Transformers

Transfer Learning with Self-Supervised Vision Transformers for Snake Identification

Crop Disease Identification by Fusing Multiscale Convolution and Vision Transformer.

Asymmetric Vision Transformers for Multi-Label Classification

Multi-label remote sensing classification with self-supervised gated multi-modal transformers

Multi-label classification of retinal disease via a novel vision transformer model

Transfer learning for versatile plant disease recognition with limited data

Diverse Instance Discovery: Vision-Transformer for Instance-Aware Multi-Label Image Recognition

Diverse Instance Discovery: Vision-Transformer for Instance-Aware Multi-Label Image Recognition.

Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation

Self-supervised Vision Transformers for Land-cover Segmentation and Classification

HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification

A Multitask Learning-Based Vision Transformer for Plant Disease Localization and Classification

Distance Restricted Transformer Encoder for Multi-Label Classification

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology

Plant-part Segmentation Using Deep Learning and Multi-View Vision

Automatic classification of ligneous leaf diseases via hierarchical vision transformer and transfer learning

Automated classification of remote sensing satellite images using deep learning based vision transformer

Unsupervised Transfer Learning for Plant Anomaly Recognition

Query2Label: A Simple Transformer Way to Multi-Label Classification

Vision Transformer for Multispectral Satellite Imagery: Advancing Landcover Classification*