Causality-Enhanced Multiple Instance Learning With Graph Convolutional Networks for Parkinsonian Freezing-of-Gait Assessment

Rui Guo,Zheng Xie,Chencheng Zhang,Xiaohua Qian
DOI: https://doi.org/10.1109/TIP.2024.3416052
Abstract:Freezing of gait (FoG) is a common disabling symptom of Parkinson's disease (PD). It is clinically characterized by sudden and transient walking interruptions for specific human body parts, and it presents the localization in time and space. Due to the difficulty in extracting global fine-grained features from lengthy videos, developing an automated five-point FoG scoring system is quite challenging. Therefore, we propose a novel video-based automated five-classification FoG assessment method with a causality-enhanced multiple-instance-learning graph convolutional network (GCN). This method involves developing a temporal segmentation GCN to segment each video into three motion stages for stage-level feature modeling, followed by a multiple-instance-learning framework to divide each stage into short clips for instance-level feature extraction. Subsequently, an uncertainty-driven multiple-instance-learning GCN is developed to capture spatial and temporal fine-grained features through GCN scheme and uncertainty learning, respectively, for acquiring global representations. Finally, a causality-enhanced graph generation strategy is proposed to exploit causal inference for mining and enhancing human structures causally related to clinical assessment, thereby extracting spatial causal features. Extensive experimental results demonstrate the excellent performance of the proposed method on five-classification FoG assessment with an accuracy of 62.72% and an acceptable accuracy of 91.32%, which is confirmed by independent testing. Additionally, it enables temporal and spatial localization of FoG events to a certain extent, facilitating reasonable clinical interpretations. In conclusion, our method provides a valuable tool for automated FoG assessment in PD, and the proposed causality-related component exhibits promising potential for extension to other general and medical fine-grained action recognition tasks.
What problem does this paper attempt to address?