Hierarchical Spatial–Temporal Window Transformer for Pose-Based Rodent Behavior Recognition

Zhihao Ru,Feng Duan
DOI: https://doi.org/10.1109/tim.2024.3379081
IF: 5.6
2024-03-30
IEEE Transactions on Instrumentation and Measurement
Abstract:In the fields of neuroscience and pharmacology, understanding rodent behavior is of vital importance for studying the effects of genetic operations and pharmacological therapies. Conventional behavior recognition methods based on raw images often struggle with noise, such as changes in the lighting conditions and the image backgrounds. On the other hand, pose-based approaches have demonstrated robustness against these challenges. However, existing methods rely on manually constructed features, which are time-consuming and may not fully exploit the potential of the pose data. In this work, we propose the hierarchical spatial–temporal window transformer network (HSTWFormer), a novel approach that efficiently extracts multiscale and cross-spacetime features from rodent pose data. By adopting a pure Transformer structure, HSTWFormer not only avoids the need for a predefined skeletal topology, but also enables adaptive recognition of interactive behaviors between multiple rodents. By merging the features of temporal neighbors, we construct a hierarchical structure with different receptive fields that retain essential information of all scales, enabling the extraction of semantic features from low to high level. Furthermore, a spatial–temporal window attention (STWA) block is introduced to capture correlations between different key points across frames. The STWA blocks facilitate the extraction of both short-term and long-term cross-spacetime features by enabling interactions between window information through window shifting, enhancing the network's modeling performance. The effectiveness of the proposed HSTWFormer is demonstrated on two datasets, CRIM13 and CalMS21. We achieved accuracies of 79.3% and 69.8% for interactive and overall behaviors in the CRIM13 dataset, and 76.4% accuracy in the CalMS21 dataset. Our method harnesses the wealth of information embedded in key points, showcasing robust modeling capabilities for accurate rodent behavior recognition, and provides a novel and effective approach to assist researchers in neuroscience and pharmacology in better quantifying rodent behavior.
engineering, electrical & electronic,instruments & instrumentation
What problem does this paper attempt to address?