Abstract:Introduction: Remote military operations require rapid response times for effective relief and critical care. Yet, the military theater is under austere conditions, so communication links are unreliable and subject to physical and virtual attacks and degradation at unpredictable times. Immediate medical care at these austere locations requires semi-autonomous teleoperated systems, which enable the completion of medical procedures even under interrupted networks while isolating the medics from the dangers of the battlefield. However, to achieve autonomy for complex surgical and critical care procedures, robots require extensive programming or massive libraries of surgical skill demonstrations to learn effective policies using machine learning algorithms. Although such datasets are achievable for simple tasks, providing a large number of demonstrations for surgical maneuvers is not practical. This article presents a method for learning from demonstration, combining knowledge from demonstrations to eliminate reward shaping in reinforcement learning (RL). In addition to reducing the data required for training, the self-supervised nature of RL, in conjunction with expert knowledge-driven rewards, produces more generalizable policies tolerant to dynamic environment changes. A multimodal representation for interaction enables learning complex contact-rich surgical maneuvers. The effectiveness of the approach is shown using the cricothyroidotomy task, as it is a standard procedure seen in critical care to open the airway. In addition, we also provide a method for segmenting the teleoperator's demonstration into subtasks and classifying the subtasks using sequence modeling. Materials and methods: A database of demonstrations for the cricothyroidotomy task was collected, comprising six fundamental maneuvers referred to as surgemes. The dataset was collected by teleoperating a collaborative robotic platform-SuperBaxter, with modified surgical grippers. Then, two learning models are developed for processing the dataset-one for automatic segmentation of the task demonstrations into a sequence of surgemes and the second for classifying each segment into labeled surgemes. Finally, a multimodal off-policy RL with rewards learned from demonstrations was developed to learn the surgeme execution from these demonstrations. Results: The task segmentation model has an accuracy of 98.2%. The surgeme classification model using the proposed interaction features achieved a classification accuracy of 96.25% averaged across all surgemes compared to 87.08% without these features and 85.4% using a support vector machine classifier. Finally, the robot execution achieved a task success rate of 93.5% compared to baselines of behavioral cloning (78.3%) and a twin-delayed deep deterministic policy gradient with shaped rewards (82.6%). Conclusions: Results indicate that the proposed interaction features for the segmentation and classification of surgical tasks improve classification accuracy. The proposed method for learning surgemes from demonstrations exceeds popular methods for skill learning. The effectiveness of the proposed approach demonstrates the potential for future remote telemedicine on battlefields.

Cross-modal self-supervised representation learning for gesture and skill recognition in robotic surgery

Hierarchical Semi-Supervised Learning Framework for Surgical Gesture Segmentation and Recognition Based on Multi-Modality Data

Uncertainty-Aware Self-Supervised Learning for Cross-Domain Technical Skill Assessment in Robot-Assisted Surgery

Multimodal semi-supervised learning for online recognition of multi-granularity surgical workflows

Gesture Recognition in Robotic Surgery With Multimodal Attention

Multi-Modal Self-Supervised Learning for Surgical Feedback Effectiveness Assessment

Semi-Supervised Learning for Surface EMG-based Gesture Recognition.

Deep Learning with Convolutional Neural Network for Objective Skill Evaluation in Robot-assisted Surgery

Domain Adaptive Robotic Gesture Recognition with Unsupervised Kinematic-Visual Data Alignment

Self-Supervised Siamese Learning on Stereo Image Pairs for Depth Estimation in Robotic Surgery

Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures

Hand gesture based control with multi-modality data - towards surgical applications

General-purpose foundation models for increased autonomy in robot-assisted surgery

Evaluating the Task Generalization of Temporal Convolutional Networks for Surgical Gesture and Motion Recognition using Kinematic Data

Learning Self-Supervised Representations from Vision and Touch for Active Sliding Perception of Deformable Surfaces

Learning Autonomous Ultrasound via Latent Task Representation and Robotic Skills Adaptation

Toward Personalized Training and Skill Assessment in Robotic Minimally Invasive Surgery

ASAP-CORPS: A Semi-Autonomous Platform for COntact-Rich Precision Surgery

Machine Learning-Based Surgical State Perception and Collaborative Control for a Vascular Interventional Robot

Multi-objective Cross-task Learning via Goal-conditioned GPT-based Decision Transformers for Surgical Robot Task Automation