Attention-based Hierarchical Convolution Neural Network for Fine-grained Crop Image Classification

Jiannan Yang,Fan Zhang,Tiantian Qian
DOI: https://doi.org/10.1109/iThings-GreenCom-CPSCom-SmartData-Cybermatics50389.2020.00035
2020-01-01
Abstract:Fine-grained crops, such as rice, dried tea leaves, are small in shape and usually densely overlapped in images. A single sample of such an object can't represent the features of a cluster of samples. This poses significant challenges when recognizing this line of objects. In this paper, We use mobile phone cameras to collect images of fine-grained crops (as shown in Fig. 1.), and propose a Hierarchical Convolution Neural Network (H-CNN) based on attention mechanism, to efficiently classify the fine-grained crops images, tea with ranked quality as a case study. We established classification models for four categories of tea (namely Meitan Turquoise Bud (MTB), Zunyi black tea, Biluochun, and Longjing tea), each one having five grades by quality. The major results include: (1) The model trained by images using one single mobile phone has very poor generalization ability whereby test accuracy is low on images collected by other mobile phones. When using the images collected by two different mobile phones for training, the model has significantly higher test accuracy on the third phone. When using three or more mobile phones for training, the further improvement is marginal (as shown in Fig. 2). (2) H-CNN with attention mechanism has an average accuracy of more than 93%, and the prediction accuracy of images taken by other mobile phones can also reach over 87%, which is superior to the existing CNN models (89% and 82.7% respectively from using VGG19 [29]).
What problem does this paper attempt to address?