Deep Learning with Convolutional Neural Network for Objective Skill Evaluation in Robot-assisted Surgery

Ziheng Wang,Ann Majewicz Fey
DOI: https://doi.org/10.1007/s11548-018-1860-1
2019-03-07
Abstract:With the advent of robot-assisted surgery, the role of data-driven approaches to integrate statistics and machine learning is growing rapidly with prominent interests in objective surgical skill assessment. However, most existing work requires translating robot motion kinematics into intermediate features or gesture segments that are expensive to extract, lack efficiency, and require significant domain-specific knowledge. We propose an analytical deep learning framework for skill assessment in surgical training. A deep convolutional neural network is implemented to map multivariate time series data of the motion kinematics to individual skill levels. We perform experiments on the public minimally invasive surgical robotic dataset, JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS). Our proposed learning model achieved a competitive accuracy of 92.5%, 95.4%, and 91.3%, in the standard training tasks: Suturing, Needle-passing, and Knot-tying, respectively. Without the need of engineered features or carefully-tuned gesture segmentation, our model can successfully decode skill information from raw motion profiles via end-to-end learning. Meanwhile, the proposed model is able to reliably interpret skills within 1-3 second window, without needing an observation of entire training trial. This study highlights the potentials of deep architectures for an proficient online skill assessment in modern surgical training.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve objective skill assessment in robot - assisted surgery. Specifically, existing skill assessment methods usually need to convert the kinematic data of robots into intermediate features or gesture segments. This method is not only costly and inefficient, but also requires a great deal of domain - specific knowledge. To solve these problems, the author proposes an analysis framework based on deep learning, especially using convolutional neural networks (CNN), to directly map kinematic features from multivariate time - series data to individual skill levels, thereby achieving end - to - end learning without manually designing features. This method aims to improve the accuracy, efficiency and reliability of assessment. Especially in minimally invasive surgical training, it can reliably interpret skills within a 1 - 3 - second time window without the need to observe the entire training trial process. In addition, this study also explores the application of data augmentation techniques to overcome the over - fitting problem caused by small - scale data sets and improve the generalization ability of the model. Verified by experiments on the publicly available minimally invasive surgical robot data set JIGSAWS, the accuracies of this model in the three standard training tasks of suturing, needle - passing and knot - tying reached 92.5%, 95.4% and 91.3% respectively, demonstrating its potential in modern surgical training.