Semi-supervised deep learning and low-cost cameras for the semantic segmentation of natural images in viticulture

A. Casado-García,J. Heras,A. Milella,R. Marani
DOI: https://doi.org/10.1007/s11119-022-09929-9
IF: 5.767
2022-06-23
Precision Agriculture
Abstract:Automatic yield monitoring and in-field robotic harvesting by low-cost cameras require object detection and segmentation solutions to tackle the poor quality of natural images and the lack of exactly-labeled datasets of consistent sizes. This work proposed the application of deep learning for semantic segmentation of natural images acquired by a low-cost RGB-D camera in a commercial vineyard. Several deep architectures were trained and compared on 85 labeled images. Three semi-supervised learning methods (PseudoLabeling, Distillation and Model Distillation) were proposed to take advantage of 320 non-annotated images. In these experiments, the DeepLabV3+ architecture with a ResNext50 backbone, trained with the set of labeled images, achieved the best overall accuracy of 84.78%. In contrast, the Manet architecture combined with the EfficientnetB3 backbone reached the highest accuracy for the bunch class (85.69%). The application of semi-supervised learning methods boosted the segmentation accuracy between 5.62 and 6.01%, on average. Further discussions are presented to show the effects of a fine-grained manual image annotation on the accuracy of the proposed methods and to compare time requirements.
agriculture, multidisciplinary
What problem does this paper attempt to address?