3D-POP -- An automated annotation approach to facilitate markerless 2D-3D tracking of freely moving birds with marker-based motion capture

Hemal Naik,Alex Hoi Hang Chan,Junran Yang,Mathilde Delacoux,Iain D. Couzin,Fumihiro Kano,Máté Nagy
2023-03-23
Abstract:Recent advances in machine learning and computer vision are revolutionizing the field of animal behavior by enabling researchers to track the poses and locations of freely moving animals without any marker attachment. However, large datasets of annotated images of animals for markerless pose tracking, especially high-resolution images taken from multiple angles with accurate 3D annotations, are still scant. Here, we propose a method that uses a motion capture (mo-cap) system to obtain a large amount of annotated data on animal movement and posture (2D and 3D) in a semi-automatic manner. Our method is novel in that it extracts the 3D positions of morphological keypoints (e.g eyes, beak, tail) in reference to the positions of markers attached to the animals. Using this method, we obtained, and offer here, a new dataset - 3D-POP with approximately 300k annotated frames (4 million instances) in the form of videos having groups of one to ten freely moving birds from 4 different camera views in a 3.6m x 4.2m area. 3D-POP is the first dataset of flocking birds with accurate keypoint annotations in 2D and 3D along with bounding box and individual identities and will facilitate the development of solutions for problems of 2D to 3D markerless pose, trajectory tracking, and identification in birds.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the problem of 2D-3D pose tracking of freely moving birds (using homing pigeons as an example) in the absence of markers. Specifically: 1. **Data Scarcity**: Currently, large-scale, high-precision 3D annotated datasets for markerless pose tracking are very scarce, especially under multi-view, high-resolution image conditions. 2. **Automated Annotation Method**: The study proposes a semi-automated method to generate large-scale datasets with 3D annotations, utilizing a motion capture system (mo-cap) to obtain the pose and position information of birds. 3. **Diversity and Practicality**: Using this method, researchers created a dataset named 3D-POP, which includes approximately 300,000 video frames (4 million instances), covering images of 1 to 10 freely moving birds captured from four different camera angles, and provides 2D and 3D keypoint annotations, bounding boxes, and individual identity information. 4. **Application Scenarios**: This dataset will aid in the development of solutions for markerless 2D to 3D pose tracking, trajectory tracking, and individual identification of birds. In summary, the goal of this paper is to advance pose estimation and tracking technologies in animal behavior studies by establishing a high-quality 3D annotated dataset.