Combining convolutional and vision transformer structures for sheep face recognition

Xiaopeng Li,Yuyun Xiang,Shuqin Li
DOI: https://doi.org/10.1016/j.compag.2023.107651
IF: 8.3
2023-01-20
Computers and Electronics in Agriculture
Abstract:Significant progress has been made in individual livestock recognition based on convolutional neural networks (CNN), however, their performance still needs improvement. Vision transformer (ViT) emerged as a cutting-edge approach, which has been successfully applied in many tasks of the computer vision field. The superior performance of ViT motivates us to study whether ViT can provide more accurate results for sheep face recognition. In this study, we propose MobileViTFace for sheep face recognition. MobileViTFace is a lightweight sheep face recognition model which combines the convolutional and transformer structures. Compared with the standard ViT model, MobileViTFace does not require too much training data and high computational complexity and is more convenient to deploy on edge devices. Extensive benchmarking tests illustrate that MobileViTFace can secure competitive performance, which achieved 97.13% recognition accuracy on 7,434 sheep face images containing 186 sheep, significantly better than lightweight models based on convolutional structures such as MobileNet, EfficientNet, etc. Parameters and floating-point operations (FLOPs) are reduced by five times compared to ResNet-50, which has similar recognition accuracy. Real-time and accurate recognition results are obtained on the Jetson Nano-based edge computing platform, which is helpful for practical production.
agriculture, multidisciplinary,computer science, interdisciplinary applications
What problem does this paper attempt to address?