Localization and recognition of pests in tea plantation based on image saliency analysis and convolutional neural network
Yang Guoguo,Bao Yidan,Liu Ziyi
DOI: https://doi.org/10.11975/j.issn.1002-6819.2017.06.020
2017-01-01
Abstract:Tea is one of important cash crops in China. Computer vision plays an important role in pest detection. Automatic classification of insect species in field is more difficult than the generic object classification because of complex background in filed and high appearance similarity among insect species. In this paper, we proposed an insect recognition system on the basis of image saliency analysis and a deep learning model, i.e. convolutional neural network (CNN), which has a good robustness with avoiding the features selected by artificial means. In image saliency analysis, we segmented the original images into super-pixel regions firstly. Then we quantized each RGB (red, green, blue) color channel and made them have 10 different values, which reduced the number of colors to 1000, and sped up the process of the color contrast of the pest objects and the background at region level. Finally, we obtained the saliency value of each region by combining their color contrast and spatial distances. The saliency values of all regions in each image were used to construct a saliency map, which was offered as the initial area for GrabCut algorithm to define the segmentation result and localize the pest object. The images after localization were quantized to 256×256 dpi for CNN training and classifying. CNN was trained end to end, from raw pixels to ultimate categories, thereby alleviating the requirement to manually design a suitable feature extractor. Based on theoretical analysis and experimental evaluation, we optimized the critical structure parameters and training strategy of CNN to seek the best configuration. The overall architecture included a number of sensitive parameters and optimization strategies that could be changed. We determined the local receptive field size, number, and convolutional stride as 7×7 dpi, 64 and 4, respectively. Dropout ratio for the fully-connected layers was 0.7. The loss function Softmax was fit for the pest classification system. To further improve the practical utility of CNN, we focused on structural changes of the overall architecture that enabled a faster running with small effects on the performance. We analyzed the performance and the corresponding runtime of our model by reducing its depth (number of layers) and width (number of convolution kernel in each layer). Removing the fully-connected layers (FC6, FC7) made only a slight difference to the overall architecture. These layers contained almost 90% of the parameters and when they were removed, the memory consumption decreased to 29.8 MB. But, removing the intermediate convolutional layers (Conv2, Conv3, Conv4, Conv5) resulted in a dramatic decrease in both accuracy and runtime. This suggested that the intermediate convolutional layers (Conv2, Conv3, Conv4, Conv5) constituted the main part of the computational resource, and their depth was important for achieving good results. We then investigated the effects of adjusting the sizes of all convolutional layers, and the filters in each convolutional layer were reduced to 64 each time. Surprisingly, all architectures showed significant decreases in running time with relatively small effects on performance. Finally, we determined the convolution kernel numbers of Conv2-Conv5: 64-192-192-64. On the test set of tea field images, the architecture before and after shrinking respectively achieved the average accuracy (AA) of 0.915 and 0.881, respectively, superior to previous methods for pest image recognition. Further, after optimization the running time reduced to 0.7 ms and the memory required was 6 MB.