Machine learning for automating subjective clinical assessment of gait impairment in people with acquired brain injury - a comparison of an image extraction and classification system to expert scoring

Ashleigh Mobbs,Michelle Kahn,Gavin Williams,Benjamin F Mentiplay,Yong-Hao Pua,Ross A Clark
DOI: https://doi.org/10.1186/s12984-024-01406-w
2024-07-23
Abstract:Background: Walking impairment is a common disability post acquired brain injury (ABI), with visually evident arm movement abnormality identified as negatively impacting a multitude of psychological factors. The International Classification of Functioning, Disability and Health (ICF) qualifiers scale has been used to subjectively assess arm movement abnormality, showing strong intra-rater and test-retest reliability, however, only moderate inter-rater reliability. This impacts clinical utility, limiting its use as a measurement tool. To both automate the analysis and overcome these errors, the primary aim of this study was to evaluate the ability of a novel two-level machine learning model to assess arm movement abnormality during walking in people with ABI. Methods: Frontal plane gait videos were used to train four networks with 50%, 75%, 90%, and 100% of participants (ABI: n = 42, healthy controls: n = 34) to automatically identify anatomical landmarks using DeepLabCut™ and calculate two-dimensional kinematic joint angles. Assessment scores from three experienced neurorehabilitation clinicians were used with these joint angles to train random forest networks with nested cross-validation to predict assessor scores for all videos. Agreement between unseen participant (i.e. test group participants that were not used to train the model) predictions and each individual assessor's scores were compared using quadratic weighted kappa. One sample t-tests (to determine over/underprediction against clinician ratings) and one-way ANOVA (to determine differences between networks) were applied to the four networks. Results: The machine learning predictions have similar agreement to experienced human assessors, with no statistically significant (p < 0.05) difference for any match contingency. There was no statistically significant difference between the predictions from the four networks (F = 0.119; p = 0.949). The four networks did however under-predict scores with small effect sizes (p range = 0.007 to 0.040; Cohen's d range = 0.156 to 0.217). Conclusions: This study demonstrated that machine learning can perform similarly to experienced clinicians when subjectively assessing arm movement abnormality in people with ABI. The relatively small sample size may have resulted in under-prediction of some scores, albeit with small effect sizes. Studies with larger sample sizes that objectively and automatically assess dynamic movement in both local and telerehabilitation assessments, for example using smartphones and edge-based machine learning, to reduce measurement error and healthcare access inequality are needed.
What problem does this paper attempt to address?