Abstract:Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts. A common key step for behavioral analysis is first extracting relevant keypoints on animals, known as pose estimation. However, reliable inference of poses currently requires domain knowledge and manual labeling effort to build supervised models. We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop unified foundation models that can be used on over 45 species, without additional human labels. Concretely, we introduce a method to unify the keypoint space across differently labeled datasets (via our generalized data converter) and for training these diverse datasets in a manner such that they don't catastrophically forget keypoints given the unbalanced inputs (via our keypoint gradient masking and memory replay approaches). These models show excellent performance across six pose benchmarks. Then, to ensure maximal usability for end-users, we demonstrate how to fine-tune the models on differently labeled data and provide tooling for unsupervised video adaptation to boost performance and decrease jitter across frames. If the models are fine-tuned, we show SuperAnimal models are 10-100$\times$ more data efficient than prior transfer-learning-based approaches. We illustrate the utility of our models in behavioral classification in mice and gait analysis in horses. Collectively, this presents a data-efficient solution for animal pose estimation.

What problem does this paper attempt to address?

The paper aims to address several key issues in animal pose estimation, particularly in the field of behavior analysis. The main issues include: 1. **Building reliable pose models requires expertise and extensive manual annotation work**: To obtain reliable animal pose models, domain expertise and a significant amount of manual annotation work are typically required to establish supervised models. 2. **Inconsistency in annotations across different datasets**: Even for the same species, different research teams may adopt different annotation methods or use different annotation names, leading to semantic and annotation biases. 3. **Trade-off between data efficiency and flexibility in existing solutions**: Although open-source tools like DeepLabCut have improved data efficiency and can adapt to different experimental environments, these tools still require a certain degree of manual annotation work, especially when users want to customize key points. To address the above issues, the paper proposes the SuperAnimal method, a new unified pre-trained model that can be used across various species and environments without additional manual annotations. Specifically, SuperAnimal addresses the issues through the following approaches: - **Universal Data Converter**: Used to unify annotations from different datasets into a common framework, addressing the problem of different research teams using different annotation methods for the same species. - **Keypoint Gradient Mask**: During training, the network is not penalized for missing keypoint information, allowing the network to train with unbalanced inputs without forgetting existing keypoint information. - **Memory Replay**: Combines zero-shot inference capability with few-shot learning methods to avoid catastrophic forgetting and improve the model's generalization ability. - **Video Adaptation Method**: Provides a video adaptation method that does not require additional labeled data, improving the smoothness and accuracy of pose estimation in video sequences. Through these technological innovations, the SuperAnimal method can efficiently handle pose estimation tasks for different species, significantly reduce the need for manual annotations, and improve the model's performance on new data.

SuperAnimal pretrained pose estimation models for behavioral analysis

SemiMultiPose: A Semi-supervised Multi-animal Pose Estimation Framework

AP-10K: A Benchmark for Animal Pose Estimation in the Wild

DeepBhvTracking: A Novel Behavior Tracking Method for Laboratory Animals Based on Deep Learning

Multi-animal 3D Social Pose Estimation, Identification and Behaviour Embedding with a Few-Shot Learning Framework

Prior-Aware Synthetic Data to the Rescue: Animal Pose Estimation with Very Limited Real Data

Pose Recognition in the Wild: Animal pose estimation using Agglomerative Clustering and Contrastive Learning

Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape

Multi-animal pose estimation, identification and tracking with DeepLabCut

SLEAP: Multi-animal pose tracking

Multi-animal pose estimation and tracking with DeepLabCut

APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond

PMotion: An advanced markerless pose estimation approach based on novel deep learning framework used to reveal neurobehavior

Anti-drift pose tracker (ADPT): A transformer-based network for robust animal pose estimation cross-species

SLEAP: A Deep Learning System for Multi-Animal Pose Tracking

AlphaTracker: A Multi-Animal Tracking and Behavioral Analysis Tool

APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking

3D Menagerie: Modeling the 3D Shape and Pose of Animals

Social Behavior Atlas: A computational framework for tracking and mapping 3D close interactions of free-moving animals

CNN-Based Action Recognition and Pose Estimation for Classifying Animal Behavior from Videos: A Survey

Automated Behavioral Analysis Using Instance Segmentation