Livestock Fish Larvae Counting using DETR and YOLO based Deep Networks

Daniel Ortega de Carvalho,Luiz Felipe Teodoro Monteiro,Fernanda Marques Bazilio,Gabriel Toshio Hirokawa Higa,Hemerson Pistori
2024-08-09
Abstract:Counting fish larvae is an important, yet demanding and time consuming, task in aquaculture. In order to address this problem, in this work, we evaluate four neural network architectures, including convolutional neural networks and transformers, in different sizes, in the task of fish larvae counting. For the evaluation, we present a new annotated image dataset with less data collection requirements than preceding works, with images of spotted sorubim and dourado larvae. By using image tiling techniques, we achieve a MAPE of 4.46% ($\pm 4.70$) with an extra large real time detection transformer, and 4.71% ($\pm 4.98$) with a medium-sized YOLOv8.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily aims to address the issue of fry counting in aquaculture. Traditionally, the identification and counting of fry require manual work, which is not only time-consuming but also prone to errors. To improve efficiency and accuracy, the research team employed machine learning techniques to automatically identify and count fry. Specifically, they evaluated the performance of four neural network architectures (including convolutional neural network-based and transformer-based architectures) at different scales and proposed an image slicing technique to enhance detection accuracy. ### Main Contributions and Objectives: 1. **Creation of a New Dataset**: The researchers constructed a new annotated image dataset containing images of fry from Spotted Sorubim and Dourado. These images were captured using smartphones, which require lower image acquisition standards compared to previous works, making the dataset more applicable to real-world scenarios. 2. **Model Evaluation**: The paper evaluated four neural network architectures, including different versions of YOLOv8 (nano, small, medium, large, extra large), Real-Time Detection Transformer (RT-DETR), Detection Transformer based on ResNet-50 (DETR), and Deformable Detection Transformer (Deformable DETR). These models were assessed on the task of fry counting in high-resolution images. 3. **Application of Image Slicing Technique**: To address hardware limitations, the researchers also used an image slicing technique, which involves dividing the original image into multiple smaller segments for processing. This helps improve the model's performance under limited hardware resources. 4. **Performance Metrics**: The research results showed that using the extra-large Real-Time Detection Transformer (RT-DETR) achieved the smallest Mean Absolute Percentage Error (MAPE) of 4.46% (standard deviation ±4.70%), while the medium-sized YOLOv8 model had a MAPE of 4.71% (standard deviation ±4.98%). In summary, the goal of this study is to develop a method that can automatically and efficiently identify and count fry, reducing the need for manual operations and enhancing the efficiency and sustainability of the aquaculture industry. By using advanced machine learning techniques, particularly deep learning-based methods, the researchers aim to achieve this goal.