Class-specific Data Augmentation for Plant Stress Classification

Nasla Saleem,Aditya Balu,Talukder Zaki Jubery,Arti Singh,Asheesh K. Singh,Soumik Sarkar,Baskar Ganapathysubramanian
2024-06-19
Abstract:Data augmentation is a powerful tool for improving deep learning-based image classifiers for plant stress identification and classification. However, selecting an effective set of augmentations from a large pool of candidates remains a key challenge, particularly in imbalanced and confounding datasets. We propose an approach for automated class-specific data augmentation using a genetic algorithm. We demonstrate the utility of our approach on soybean [Glycine max (L.) Merr] stress classification where symptoms are observed on leaves; a particularly challenging problem due to confounding classes in the dataset. Our approach yields substantial performance, achieving a mean-per-class accuracy of 97.61% and an overall accuracy of 98% on the soybean leaf stress dataset. Our method significantly improves the accuracy of the most challenging classes, with notable enhancements from 83.01% to 88.89% and from 85.71% to 94.05%, respectively. A key observation we make in this study is that high-performing augmentation strategies can be identified in a computationally efficient manner. We fine-tune only the linear layer of the baseline model with different augmentations, thereby reducing the computational burden associated with training classifiers from scratch for each augmentation policy while achieving exceptional performance. This research represents an advancement in automated data augmentation strategies for plant stress classification, particularly in the context of confounding datasets. Our findings contribute to the growing body of research in tailored augmentation techniques and their potential impact on disease management strategies, crop yields, and global food security. The proposed approach holds the potential to enhance the accuracy and efficiency of deep learning-based tools for managing plant stresses in agriculture.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to improve the accuracy of plant stress classification by automatically selecting class - specific data augmentation strategies, especially in the case of imbalanced datasets and the existence of confusing classes**. Specifically, the authors focus on: 1. **The challenge of manually selecting effective data augmentation methods**: Selecting effective augmentation methods suitable for a specific dataset from a large number of candidate augmentation methods is a complex and time - consuming task. This problem is more prominent especially in the case of imbalanced datasets or the existence of confusing classes (i.e., difficult - to - distinguish classes). 2. **The limitations of traditional data augmentation methods**: Traditional data augmentation methods are usually applied to the entire dataset, ignoring the different sensitivities of different classes to augmentation methods. For example, some augmentation methods may be beneficial to some classes but have a negative impact on other classes. 3. **The limitation of computing resources**: Existing automated data augmentation methods (such as AutoAugment, Fast AutoAugment, etc.) are effective but have high computational costs and are not suitable for resource - limited situations. To solve these problems, the authors propose an automated class - specific data augmentation method based on the genetic algorithm (GA). The main goals of this method are: - **Automatically select the optimal class - specific augmentation strategy**: Optimize the augmentation strategy for each class through the genetic algorithm to maximize the mean - per - class accuracy (MPCA) of the classifier. - **Reduce the computational burden**: Significantly reduce the demand for computing resources by only fine - tuning the linear layer of the convolutional neural network (CNN) model and using the augmentation strategy generated by the genetic algorithm. ### Method overview 1. **Dataset**: The study used a publicly available dataset containing 16,573 soybean leaf images, covering nine different classes (eight different soybean stress types and healthy leaves). 2. **Baseline model**: ResNet50 was selected as the baseline model and trained without using any data augmentation to ensure the fairness of performance evaluation. 3. **Genetic algorithm - optimized data augmentation**: - **Initialization**: Create an initial set of augmentation probabilities ranging from 0 to 1. - **Evaluation**: Evaluate the performance of each augmentation strategy through the mean - per - class accuracy on the test set. - **Selection**: Select the augmentation strategies with higher performance as the parents of the next generation. - **Crossover**: Combine the probabilities of two augmentation strategies to generate new offspring. - **Mutation**: Introduce random changes to maintain diversity and explore new search spaces. 4. **Fine - tune the baseline model**: Use the augmentation strategy generated by the genetic algorithm to fine - tune the baseline model, and only train for 5 epochs to reduce the computational cost. ### Results The experimental results show that this method significantly improves the mean - per - class accuracy of the classifier, from 95.09% to 97.61%. Especially for the two most difficult - to - classify classes (bacterial wilt and bacterial pustule), the accuracies are increased from 83.01% to 88.89% and from 85.71% to 94.05% respectively. These results prove the effectiveness of the class - specific data augmentation method, especially when dealing with datasets with confusing classes. In addition, this method is also more efficient in computing resources, making it more feasible in practical applications. ### Formula summary - The formula for calculating the mean - per - class accuracy (MPCA) is: \[ \text{MPCA}=\frac{1}{N}\sum_{i = 1}^{N}\text{Accuracy}_i \] where \( N \) is the number of classes and \(\text{Accuracy}_i\) is the accuracy of the \( i \)-th class. - The definition of the augmentation probability matrix \( p \) is: \[ p=(p_{ij}) \] where \( p_{ij}\) represents the probability of applying the \( j \)-th augmentation technique to the \( i \)-th class of samples, and satisfies \( 0\leq p_{ij}\leq1\). Through this method, the authors have successfully solved the problems in plant stress classification.