SuperAnimal pretrained pose estimation models for behavioral analysis

Shaokai Ye,Anastasiia Filippova,Jessy Lauer,Steffen Schneider,Maxime Vidal,Tian Qiu,Alexander Mathis,Mackenzie Weygandt Mathis
2023-12-31
Abstract:Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts. A common key step for behavioral analysis is first extracting relevant keypoints on animals, known as pose estimation. However, reliable inference of poses currently requires domain knowledge and manual labeling effort to build supervised models. We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop unified foundation models that can be used on over 45 species, without additional human labels. Concretely, we introduce a method to unify the keypoint space across differently labeled datasets (via our generalized data converter) and for training these diverse datasets in a manner such that they don't catastrophically forget keypoints given the unbalanced inputs (via our keypoint gradient masking and memory replay approaches). These models show excellent performance across six pose benchmarks. Then, to ensure maximal usability for end-users, we demonstrate how to fine-tune the models on differently labeled data and provide tooling for unsupervised video adaptation to boost performance and decrease jitter across frames. If the models are fine-tuned, we show SuperAnimal models are 10-100$\times$ more data efficient than prior transfer-learning-based approaches. We illustrate the utility of our models in behavioral classification in mice and gait analysis in horses. Collectively, this presents a data-efficient solution for animal pose estimation.
Computer Vision and Pattern Recognition,Artificial Intelligence,Quantitative Methods
What problem does this paper attempt to address?
The paper aims to address several key issues in animal pose estimation, particularly in the field of behavior analysis. The main issues include: 1. **Building reliable pose models requires expertise and extensive manual annotation work**: To obtain reliable animal pose models, domain expertise and a significant amount of manual annotation work are typically required to establish supervised models. 2. **Inconsistency in annotations across different datasets**: Even for the same species, different research teams may adopt different annotation methods or use different annotation names, leading to semantic and annotation biases. 3. **Trade-off between data efficiency and flexibility in existing solutions**: Although open-source tools like DeepLabCut have improved data efficiency and can adapt to different experimental environments, these tools still require a certain degree of manual annotation work, especially when users want to customize key points. To address the above issues, the paper proposes the SuperAnimal method, a new unified pre-trained model that can be used across various species and environments without additional manual annotations. Specifically, SuperAnimal addresses the issues through the following approaches: - **Universal Data Converter**: Used to unify annotations from different datasets into a common framework, addressing the problem of different research teams using different annotation methods for the same species. - **Keypoint Gradient Mask**: During training, the network is not penalized for missing keypoint information, allowing the network to train with unbalanced inputs without forgetting existing keypoint information. - **Memory Replay**: Combines zero-shot inference capability with few-shot learning methods to avoid catastrophic forgetting and improve the model's generalization ability. - **Video Adaptation Method**: Provides a video adaptation method that does not require additional labeled data, improving the smoothness and accuracy of pose estimation in video sequences. Through these technological innovations, the SuperAnimal method can efficiently handle pose estimation tasks for different species, significantly reduce the need for manual annotations, and improve the model's performance on new data.