Cucumber disease recognition with small samples using image-text-label-based multi-modal language model

Yiyi Cao,Lei Chen,Yuan Yuan,Guangling Sun
DOI: https://doi.org/10.1016/j.compag.2023.107993
IF: 8.3
2023-06-25
Computers and Electronics in Agriculture
Abstract:Few-shot learning methods only need a small size of samples to train a good model. Moreover, most of these methods consider a single modality, ignoring the correlation between multi-modal data. Therefore, using multi-modal methods to solve the small-sample-size problem has become the development trend of artificial intelligence. In recent years, a multi-model method called Vision-Language Pre-training (VLP) has emerged. The semantic relation between multiple modalities can be learned through pre-training, thus obtaining better performance on downstream tasks. Accordingly, this paper took cucumber disease recognition with small samples as an example and proposed a recognition method of a multi-modal language model based on image-text-label information. First, image-text multi-modal contrastive learning, image self-supervised contrastive learning, and label information were combined to measure the distance of samples in the common image-text-label space. Second, the classification methods and optimization of large-scale vision-language pre-training on small sample cucumber datasets were studied. The proposed model achieved a recognition accuracy rate of 94.84% on a small multi-modal cucumber disease dataset. Finally, some experiments on the public dataset demonstrated that our method has good generalization.
agriculture, multidisciplinary,computer science, interdisciplinary applications
What problem does this paper attempt to address?