Machine learning analyses of automated performance metrics during granular sub-stitch phases predict surgeon experience

Andrew B Chen,Siqi Liang,Jessica H Nguyen,Yan Liu,Andrew J Hung,Andrew B. Chen,Jessica H. Nguyen,Andrew J. Hung
DOI: https://doi.org/10.1016/j.surg.2020.09.020
IF: 4.348
2021-05-01
Surgery
Abstract:<p>Automated performance metrics objectively measure surgeon performance during a robot-assisted radical prostatectomy. Machine learning has demonstrated that automated performance metrics, especially during the vesico-urethral anastomosis of the robot-assisted radical prostatectomy, are predictive of long-term outcomes such as continence recovery time. This study focuses on automated performance metrics during the vesico-urethral anastomosis, specifically on <em>stitch</em> versus <em>sub-stitch</em> levels, to distinguish surgeon experience. During the vesico-urethral anastomosis, automated performance metrics, recorded by a systems data recorder (Intuitive Surgical, Sunnyvale, CA, USA), were reported for each overall stitch (C<sup>total</sup>) and its individual components: needle handling/targeting (C<sup>1</sup>), needle driving (C<sup>2</sup>), and suture cinching (C<sup>3</sup>) (Fig 1, <em>A</em>). These metrics were organized into three datasets (GlobalSet [whole stitch], RowSet [independent sub-stitches], and ColumnSet [associated sub-stitches] (Fig 1, <em>B</em>) and applied to three machine learning models (AdaBoost, gradient boosting, and random forest) to solve two classifications tasks: experts (≥100 cases) versus novices (&lt;100 cases) and ordinary experts (≥100 and &lt;2,000 cases) versus super experts (≥2,000 cases). Classification accuracy was determined using analysis of variance. Input features were evaluated through a Jaccard index. From 68 vesico-urethral anastomoses, we analyzed 1,570 stitches broken down into 4,708 sub-stitches. For both classification tasks, ColumnSet best distinguished experts (<em>n</em> = 8) versus novices (<em>n</em> = 9) and ordinary experts (<em>n</em> = 5) versus super experts (<em>n</em> = 3) at an accuracy of 0.774 and 0.844, respectively. Feature ranking highlighted Endowrist articulation and <em>needle handling/targeting</em> as most important in classification. Surgeon performance measured by automated performance metrics on a granular sub-stitch level more accurately distinguishes expertise when compared with summary automated performance metrics over whole stitches.</p>
surgery
What problem does this paper attempt to address?