Safe Navigation for Robotic Digestive Endoscopy via Human Intervention-based Reinforcement Learning

Min Tan,Yushun Tao,Boyun Zheng,GaoSheng Xie,Lijuan Feng,Zeyang Xia,Jing Xiong
2024-09-24
Abstract:With the increasing application of automated robotic digestive endoscopy (RDE), ensuring safe and efficient navigation in the unstructured and narrow digestive tract has become a critical challenge. Existing automated reinforcement learning navigation algorithms, often result in potentially risky collisions due to the absence of essential human intervention, which significantly limits the safety and effectiveness of RDE in actual clinical practice. To address this limitation, we proposed a Human Intervention (HI)-based Proximal Policy Optimization (PPO) framework, dubbed HI-PPO, which incorporates expert knowledge to enhance RDE's safety. Specifically, we introduce an Enhanced Exploration Mechanism (EEM) to address the low exploration efficiency of the standard PPO. Additionally, a reward-penalty adjustment (RPA) is implemented to penalize unsafe actions during initial interventions. Furthermore, Behavior Cloning Similarity (BCS) is included as an auxiliary objective to ensure the agent emulates expert actions. Comparative experiments conducted in a simulated platform across various anatomical colon segments demonstrate that our model effectively and safely guides RDE.
Robotics,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the issue of safe and efficient navigation of Robotic Digestive Endoscopes (RDE) in unstructured and narrow digestive tracts. Currently, automated navigation algorithms based on reinforcement learning may lead to potential collision risks in actual clinical applications due to the lack of necessary human intervention, severely affecting the safety and effectiveness of RDE. To overcome this limitation, the authors propose a Proximal Policy Optimization (PPO) framework based on human intervention, called HI-PPO. This framework enhances the safety of RDE by incorporating expert knowledge and specifically proposes the following mechanisms: 1. **Enhanced Exploration Mechanism (EEM)**: Improves the exploration efficiency of standard PPO. 2. **Reward-Penalty Adjustment (RPA)**: Promotes safer policy learning by penalizing unsafe behaviors during initial interventions. 3. **Behavior Cloning Similarity (BCS)**: Ensures that the agent can mimic expert actions, improving learning performance in complex environments. Experimental results show that the HI-PPO method can effectively guide RDE in various anatomical colon segments, significantly reducing the number of collisions and improving safety. This indicates that the method not only enhances accuracy but also provides a safer navigation solution in complex surgical environments.