Abstract:Thyroid ultrasound video provides significant value for thyroid diseases diagnosis, but the ultrasound imaging process is often affected by the speckle noise, resulting in poor quality of the ultrasound video. Numerous video denoising methods have been proposed to remove noise while preserving texture details. However, existing methods still suffer from the following problems: (1) relevant temporal features in the low-contrast ultrasound video cannot be accurately aligned and effectively aggregated by simple optical flow or motion estimation, resulting in the artifacts and motion blur in the video; (2) fixed receptive field in spatial features integration lacks the flexibility of aggregating features in the global region of interest and is susceptible to interference from irrelevant noisy regions. In this work, we propose a deformable spatial-temporal attention denoising network to remove speckle noise in thyroid ultrasound video. The entire network follows the bidirectional feature propagation mechanism to efficiently exploit the spatial-temporal information of the whole video sequence. In this process, two modules are proposed to address the above problems: (1) a deformable temporal attention module (DTAM) is designed after optical flow pre-alignment to further capture and aggregate relevant temporal features according to the learned offsets between frames, so that inter-frame information can be better exploited even with the imprecise flow estimation under the low contrast of ultrasound video; (2) a deformable spatial attention module (DSAM) is proposed to flexibly integrate spatial features in the global region of interest through the learned intra-frame offsets, so that irrelevant noisy information can be ignored and essential information can be precisely exploited. Finally, all these refined features are rectified and merged through residual convolution blocks to recover the clean video frames. Experimental results on our thyroid ultrasound video (US-V) dataset and the DDTI dataset demonstrate that our proposed method exceeds 1.2 ∼ 1.3 dB on PSNR and has clearer texture detail compared to other state-of-the-art methods. In the meantime, the proposed model can also assist thyroid nodule segmentation methods to achieve more accurate segmentation effect, which provides an important basis for thyroid diagnosis. In the future, the proposed model can be improved and extended to other medical image sequence datasets, including CT and MRI slice denoising. The code and datasets are provided at https://github.com/Meta-MJ/DSTAN .

Multi-task Video Enhancement for Dental Interventions

Real-Time Multi-Label Upper Gastrointestinal Anatomy Recognition from Gastroscope Videos

A real-time interactive restoration system for intraoral digital videos using segment anything model

A deep learning framework for quality assessment and restoration in video endoscopy

DPML: Prior-Guided Multitask Learning for Dental Object Recognition on Limited Panoramic Radiograph Dataset

DetSegDiff: A joint periodontal landmark detection and segmentation in intraoral ultrasound using edge-enhanced diffusion-based network

TeethDreamer: 3D Teeth Reconstruction from Five Intra-oral Photographs

A cross-temporal multimodal fusion system based on deep learning for orthodontic monitoring

Unsupervised Microscopy Video Denoising

A novel bronchoscopic video enhancement and tissue segmentation method based on Eulerian video magnification

Multi-task Fundus Image Quality Assessment Via Transfer Learning and Landmarks Detection

OralViewer: 3D Demonstration of Dental Surgeries for Patient Education with Oral Cavity Reconstruction from a 2D Panoramic X-ray

AI-enabled Automatic Multimodal Fusion of Cone-Beam CT and Intraoral Scans for Intelligent 3D Tooth-Bone Reconstruction and Clinical Applications

Low-light image enhancement of high-speed endoscopic videos using a convolutional neural network

Efficient video indexing for monitoring disease activity and progression in the upper gastrointestinal tract

Caries and Restoration Detection Using Bitewing Film Based on Transfer Learning with CNNs

DSTAN: A Deformable Spatial-temporal Attention Network with Bidirectional Sequence Feature Refinement for Speckle Noise Removal in Thyroid Ultrasound Video

Simplify Implant Depth Prediction as Video Grounding: A Texture Perceive Implant Depth Prediction Network

DMAF-Net: Deformable multi-scale adaptive fusion network for dental structure detection with panoramic radiographs

DeMambaNet: Deformable Convolution and Mamba Integration Network for High-Precision Segmentation of Ambiguously Defined Dental Radicular Boundaries

Automated Dental Image Analysis by Deep Learning on Small Dataset