Data Augmentation for Deep Learning Based Semantic Segmentation and Crop-Weed Classification in Agricultural Robotics

Daobilige Su,He Kong,Yongliang Qiao,Salah Sukkarieh
DOI: https://doi.org/10.1016/j.compag.2021.106418
IF: 8.3
2021-01-01
Computers and Electronics in Agriculture
Abstract:• A data augmentation method is proposed to train deep neural nets with limited data. • Random image patching is used to generate new images for training. • Edge matching cost is introduced to select optimum generated images. • Mean accuracy increases from 91.01, 97.99 to 94.02, 98.51 in two datasets. • Mean IOU increases from 63.59, 74.26 to 70.77, 77.09 in two datasets. Deep learning methods such as convolutional neural networks (CNN) have become popular for addressing crops and weeds classification problems in agricultural robotics. However, to have satisfactory performance and avoid overfitting, training deep neural nets typically requires thousands of labeled images. This leads to tedious pixelwise labeling for semantic segmentation. In this paper, we hinge on the recent development in data augmentation and utilize the concept further for semantic segmentation and classification of crops and weeds. To be specific, we propose a novel data augmentation framework, based on the random image cropping and patching (RICAP) method, which is originally designed to augment data for generic image classification. The proposed framework introduces novel enhancements to the original RICAP so that it can be effectively used for data augmentation of semantic segmentation tasks. We evaluate the proposed methodology on two datasets from different farms. Comprehensive experimental evaluations and ablation studies show that the proposed framework can effectively improve segmentation accuracies, and the enhancements made over the original RICAP actually contribute to the performance gain. On average, the proposed method increases the mean accuracy and mean intersection over union (IOU) of the deep neural net with the conventional data augmentation (random flipping, rotation and colour jitter) from 91.01 to 94.02 and from 63.59 to 70.77 respectively for Narrabri dataset, and from 97.99 to 98.51 and from 74.26 to 77.09 respectively for Bonn dataset. The limitation of the proposed method, especially when a large number of training data is available, has also been discussed.
What problem does this paper attempt to address?