Multiscale geometric window transformer for orthodontic teeth point cloud registration

Hao Wang,Yan Tian,Yongchuan Xu,Jiahui Xu,Tao Yang,Yan Lu,Hong Chen
DOI: https://doi.org/10.1007/s00530-024-01369-x
IF: 3.9
2024-06-01
Multimedia Systems
Abstract:Digital orthodontic treatment monitoring has been gaining increasing attention in the past decade. However, current methods based on deep learning still face difficult challenges. Transformer, due to its excellent ability to model long-term dependencies, can be applied to the task of tooth point cloud registration. Nonetheless, most transformer-based point cloud registration networks suffer from two problems. First, they lack the embedding of credible geometric information, resulting in learned features that are not geometrically discriminative and blur the boundary between inliers and outliers. Second, the attention mechanism lacks continuous downsampling during geometric transformation invariant feature extraction at the superpixel level, thereby limiting the field of view and potentially limiting the model's perception of local and global information. In this paper, we propose GeoSwin, which uses a novel geometric window transformer to achieve accurate registration of tooth point clouds in different stages of orthodontic treatment. This method uses the point distance, normal vector angle, and bidirectional spatial angular distances as the input geometric embedding of transformer, and then uses a proposed variable multiscale attention mechanism to achieve geometric information perception from local to global perspectives. Experiments on the Shing3D Dental Dataset demonstrate the effectiveness of our approach and that it outperforms other state-of-the-art approaches across multiple metrics. Our code and models are available at GeoSwin.
computer science, information systems, theory & methods
What problem does this paper attempt to address?