Road to automating robotic suturing skills assessment: Battling mislabeling of the ground truth

Sirisha Rambhatla,Nilay Pachauri,Erik Vanstrum,Yan Liu,Andrew J. Hung,Daniel I. Sanford,Jessica H. Nguyen
DOI: https://doi.org/10.1016/j.surg.2021.08.014
IF: 4.348
2022-04-01
Surgery
Abstract:ObjectiveTo automate surgeon skills evaluation using robotic instrument kinematic data. Additionally, to implement an unsupervised mislabeling detection algorithm to identify potentially mislabeled samples that can be removed to improve model performance.MethodsVideo recordings and instrument kinematic data were derived from suturing exercises completed on the Mimic FlexVR robotic simulator. A structured human consensus-building process was developed to determine Robotic Anastomosis Competency Evaluation technical scores across 3 human graders. A 2-layer long short-term memory–based classification model used instrument kinematic data to automate suturing skills assessment. An unsupervised label analyzer (NoiseRank) was used to identify potential mislabeling of skills data. Performance of the long short-term memory model's technical skill score prediction was measured by best area under the curve over the training runs. NoiseRank outputted a ranked list of rated skills assessments based on likelihood of mislabeling.Results22 surgeons performed 226 suturing attempts, which were broken down into 1,404 individual skill assessment points. Automation of needle entry angle, needle driving, and needle withdrawal technical skill scores performed better (area under the curve 0.698–0.705) than needle positioning (0.532) at baseline using all available data. Potential mislabels were subsequently identified by NoiseRank and removed, improving model performance across all domains (area under the curve 0.551–0.766).ConclusionUsing ground truth labels from human graders and robotic instrument kinematic data, machine learning models have automated assessment of detailed suturing technical skills with good performance. Further, an unsupervised mislabeling detection algorithm projected mislabeled data, allowing for their removal and subsequent improvement of model performance.
surgery
What problem does this paper attempt to address?