Abstract:Automated wildlife surveys based on drone imagery and object detection technology are a powerful and increasingly popular tool in conservation biology. Most detectors require training images with annotated bounding boxes, which are tedious, expensive, and not always unambiguous to create. To reduce the annotation load associated with this practice, we develop POLO, a multi-class object detection model that can be trained entirely on point labels. POLO is based on simple, yet effective modifications to the YOLOv8 architecture, including alterations to the prediction process, training losses, and post-processing. We test POLO on drone recordings of waterfowl containing up to multiple thousands of individual birds in one image and compare it to a regular YOLOv8. Our experiments show that at the same annotation cost, POLO achieves improved accuracy in counting animals in aerial imagery.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to reduce the annotation cost when conducting wildlife surveys based on drone images while improving the accuracy of animal counting. Specifically: 1. **High Annotation Cost Problem**: Traditional object detection models (such as YOLOv8) require a large number of training images with annotated bounding boxes. These annotation tasks are time - consuming, expensive, and in some cases difficult to annotate clearly. 2. **Small - Object Detection Challenges**: In the images taken by drones, animals are usually very small (only a few pixels), and may be partially occluded or deformed due to perspective and motion blur. This makes the quality of automatically created bounding boxes poor, affecting the detection accuracy. To solve these problems, the authors developed a multi - class object detection model named POLO (Point - based, multi - class animal detection), which can be trained entirely based on point labels instead of relying on bounding box annotations. By modifying the prediction process, loss function, and post - processing steps in the YOLOv8 architecture, POLO can achieve higher accuracy in animal counting with the same annotation cost. ### Main Contributions - **Point - Label Training**: POLO can be directly trained using point labels, reducing the annotation workload. - **Improved YOLOv8 Architecture**: Simple but effective modifications were made to YOLOv8, including output dimensions, loss functions, and post - processing methods. - **Experimental Verification**: Tests were carried out on a drone - image data set of Izembek Lagoon in Alaska, proving that POLO outperforms the traditional YOLOv8 model in the counting tasks of multiple species. ### Formula Summary - **Center - Point Prediction Formula**: \[ \hat{p}_x=\sigma(a_1)\cdot2^{- 0.5}+c_x \] \[ \hat{p}_y=\sigma(a_2)\cdot2^{- 0.5}+c_y \] where \(\hat{p}_x\) and \(\hat{p}_y\) are the predicted coordinates, \(a_1\) and \(a_2\) are the activation values of the grid cell in the first and second output channels, \(\sigma(\cdot)\) is the Sigmoid function, and \(c_x\) and \(c_y\) are the coordinates of the upper - left corner of the grid cell. - **Average Hausdorff Distance Loss**: \[ L_{AH}(\hat{P}, P)=\frac{1}{|P|}\sum_{i = 1}^{|P|}\min_{\hat{p}\in\hat{P}}d(\hat{p}, p_i)+\frac{1}{|\hat{P}|}\sum_{j = 1}^{|\hat{P}|}\min_{p\in P}d(\hat{p}_j, p) \] - **Mean - Square - Error Loss**: \[ L_{MSE}=\frac{1}{|P|}\sum_{i = 1}^{|P|}\|p_i-\hat{p}_i\|_2^2 \] - **Distance - over - Radius (DoR) Indicator**: \[ DoR=\frac{d(\hat{p}, p)}{r_c} \] where \(d(\hat{p}, p)\) is the Euclidean distance between the predicted point and the real position, and \(r_c\) is the radius value specified by the user for each object/animal category. Through these improvements, POLO not only reduces the annotation cost but also performs well in the counting tasks of multiple species, especially having an advantage in dealing with small targets and dense scenes.

POLO -- Point-based, multi-class animal detection

YoloXT: A Object Detection Algorithm for Marine Benthos

YOLO for Penguin Detection and Counting Based on Remote Sensing Images

POSEIDON: A Data Augmentation Tool for Small Object Detection Datasets in Maritime Environments

Optimization Research of Bird Detection Algorithm Based on YOLO in Deep Learning Environment

HP-YOLOv8: High-Precision Small Object Detection Algorithm for Remote Sensing Images

YOLO-Q: Drone Aerial Target Detection

PLOD-YOLO: Premium Lightweight Object Detection for Autonomous Following Robot

Multi-Species Object Detection in Drone Imagery for Population Monitoring of Endangered Animals

YOLOD: A Target Detection Method for UAV Aerial Imagery

An algorithm for cattle counting in rangeland based on multi‐scale perception and image association

Pest-YOLO: A model for large-scale multi-class dense and tiny pest detection and counting

MS-YOLO: integration-based multi-subnets neural network for object detection in aerial images

LAM-YOLO: Drones-based Small Object Detection on Lighting-Occlusion Attention Mechanism YOLO

Efficient Multi-Receptive Pooling YOLOv5 with Coordinate Attention Module for Object Detection on Drone

VAMYOLOX: an Accurate and Efficient Object Detection Algorithm Based on Visual Attention Mechanism for UAV Optical Sensors

An Efficient Method for Monitoring Birds Based on Object Detection and Multi-Object Tracking Networks

Half a Percent of Labels is Enough: Efficient Animal Detection in UAV Imagery Using Deep CNNs and Active Learning

Modular YOLOv8 optimization for real-time UAV maritime rescue object detection

Wildlife Object Detection Method Applying Segmentation Gradient Flow and Feature Dimensionality Reduction

A first step towards automated species recognition from camera trap images of mammals using AI in a European temperate forest