Adaptive density guided network with CNN and Transformer for underwater fish counting
Shijian Zheng,Rujing Wang,Shitao Zheng,Liusan Wang,Hongkui Jiang
DOI: https://doi.org/10.1016/j.jksuci.2024.102088
IF: 9.006
2024-06-10
Journal of King Saud University - Computer and Information Sciences
Abstract:Highlights • Introduction of a UNet-like multi-level feature fusion structure for improved alignment of fish targets across different scales and feature pyramid levels, addressing scale and deformation challenges. • Implementation of a density-guided adaptive selection module that combines CNN and Transformer architectures to enhance information exchange in low and high-density areas, effectively handling uneven fish density distribution. • Construction of SHUFD and RHUFD high-density datasets with 500 and 1750 images, respectively, along with corresponding point labels, serving as crucial resources for validating high-density fish counting methods. • Achievement of optimal performance with our proposed counting network method, which minimizes model parameters to 9.1M, six times fewer than the previous state-of-the-art method (CUT). Accurate assessment of high-density underwater fish resources is vital to the aquaculture industry. It is directly related to the formulation of fishery insurance strategies and the implementation of breeding plans. However, accurately counting fish in high-density environments becomes challenging due to the uneven distribution of fish density and individual fish's different sizes and postures. To break through this technical bottleneck, we developed an advanced adaptive density-guided high-density fish counting network. In detail, first of all, the network adopts a multi-layer feature fusion structure similar to UNet, which significantly enhances the matching between fish targets of different scales and feature pyramid levels, effectively alleviating the problems caused by scale changes and morphological deformations. Secondly, the network also introduces a density-guided adaptive selection module, which can intelligently judge the applicability of Convolutional Neural Network and Transformer blocks in different density areas, thereby achieving robust information transfer and interaction between blocks. Finally, to verify the effectiveness of this method, we also specially constructed two high-density data sets: a simulated high-density underwater fish image data set (SHUFD) and a real high-density underwater fish image data set (RHUFD). The proposed method has significant improvements over the state-of-the-art method (CUT) on SHUFD and RHUFD datasets, with the mean absolute error, mean square error, background region bias, foreground region bias and density map bias indicators improving by 3.44% and 6.47%, 11.43% and 4.41%, 23.91% and 29.48%, 4.43% and 10.33%, 8.3% and 13.14%, respectively.
computer science, information systems