Head Pose Estimation and 3D Neural Surface Reconstruction via Monocular Camera in situ for Navigation and Safe Insertion into Natural Openings

Ruijie Tang,Beilei Cui,Hongliang Ren
2024-06-19
Abstract:As the significance of simulation in medical care and intervention continues to grow, it is anticipated that a simplified and low-cost platform can be set up to execute personalized diagnoses and treatments. 3D Slicer can not only perform medical image analysis and visualization but can also provide surgical navigation and surgical planning functions. In this paper, we have chosen 3D Slicer as our base platform and monocular cameras are used as sensors. Then, We used the neural radiance fields (NeRF) algorithm to complete the 3D model reconstruction of the human head. We compared the accuracy of the NeRF algorithm in generating 3D human head scenes and utilized the MarchingCube algorithm to generate corresponding 3D mesh models. The individual's head pose, obtained through single-camera vision, is transmitted in real-time to the scene created within 3D Slicer. The demonstrations presented in this paper include real-time synchronization of transformations between the human head model in the 3D Slicer scene and the detected head posture. Additionally, we tested a scene where a tool, marked with an ArUco Maker tracked by a single camera, synchronously points to the real-time transformation of the head posture. These demos indicate that our methodology can provide a feasible real-time simulation platform for nasopharyngeal swab collection or intubation.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to achieve high - precision 3D head model reconstruction and real - time pose estimation in medical scenarios through simplified and low - cost methods, in order to support the safe navigation of operations such as nasopharyngeal swab sampling or tracheal intubation. Specifically: 1. **Simplified and low - cost 3D reconstruction**: - Traditional 3D reconstruction methods (such as using 3D scanners or optical tracking devices) are costly and complex. In order to reduce costs and improve operability, the paper proposes a method based on a monocular camera and the Neural Radiance Field (NeRF) algorithm to complete high - quality 3D head model reconstruction. 2. **Real - time pose estimation and synchronization**: - The paper explores how to capture a person's head pose (such as pitch angle, yaw angle, roll angle) in real - time through a monocular camera and synchronize this pose information in real - time to the 3D head model in the 3D Slicer platform. This enables doctors to simulate actual operations in a virtual environment, thereby improving the safety and accuracy of surgeries. 3. **Application in medical scenarios**: - Especially during the epidemic, in order to avoid the risk of infection for medical staff, the paper proposes unmanned nasopharyngeal swab sampling and tracheal intubation schemes. Through digital twin technology, accurate modeling of the patient's head and pose monitoring can be achieved, thereby guiding robots to perform safe operations. ### Specific implementation methods 1. **3D model reconstruction**: - Use the NeRF algorithm to generate a 3D head model from multi - angle photos taken by a monocular camera. NeRF maps 3D position \( \mathbf{x} \) and viewing direction \( \mathbf{d} \) to color \( c \) and volume density \( \sigma \) through a fully - connected network (MLP), and the formula is as follows: \[ (c, \sigma) = F_\Theta(\gamma_L(\mathbf{x}), \gamma_L(\mathbf{d})) \] where \( F_\Theta \) represents an MLP with parameters \( \Theta \), and \( \gamma_L(\cdot) \) represents the position - encoding function. - Use the Marching Cubes algorithm to convert the implicit representation into an explicit 3D mesh model. 2. **Head pose estimation**: - Detect facial feature points through OpenCV's DNN module and use the Levenberg - Marquardt optimization algorithm to calculate the transformation matrices \( R \) and \( t \) between the camera coordinate system and the world coordinate system, thereby determining the head pose. The formulas are as follows: \[ \begin{bmatrix} X \\ Y \\ Z \end{bmatrix} = h \begin{bmatrix} R & \mathbf{t} \end{bmatrix} \begin{bmatrix} U \\ V \\ W \\ 1 \end{bmatrix} \] \[ \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = s \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} X \\ Y \\ Z \end{bmatrix} \] 3. **Registration and synchronization**: - Align the real - time - captured facial image with the 3D head model to ensure that they are on the same plane and maintain a consistent scale and rotation angle.