Abstract:Objective Tongue segmentation as a basis for automated tongue recognition studies in Chinese medicine, which has defects such as network degradation and inability to obtain global features, which seriously affects the segmentation effect. This article proposes an improved model RTC_TongueNet based on DeepLabV3, which combines the improved residual structure and transformer and integrates the ECA (Efficient Channel Attention Module) attention mechanism of multiscale atrous convolution to improve the effect of tongue image segmentation. Methods In this paper, we improve the backbone network based on DeepLabV3 by incorporating the transformer structure and an improved residual structure. The residual module is divided into two structures and uses different residual structures under different conditions to speed up the frequency of shallow information mapping to deep network, which can more effectively extract the underlying features of tongue image; introduces ECA attention mechanism after concat operation in ASPP (Atrous Spatial Pyramid Pooling) structure to strengthen information interaction and fusion, effectively extract local and global features, and enable the model to focus more on difficult-to-separate areas such as tongue edge, to obtain better segmentation effect. Results The RTC_TongueNet network model was compared with FCN (Fully Convolutional Networks), UNet, LRASPP (Lite Reduced ASPP), and DeepLabV3 models on two datasets. On the two datasets, the MIOU (Mean Intersection over Union) and MPA (Mean Pixel Accuracy) values of the classic model DeepLabV3 were higher than those of FCN, UNet, and LRASPP models, and the performance was better. Compared with the DeepLabV3 model, the RTC_TongueNet network model increased MIOU value by 0.9% and MPA value by 0.3% on the first dataset; MIOU increased by 1.0% and MPA increased by 1.1% on the second dataset. RTC_TongueNet model performed best on both datasets. Conclusion In this study, based on DeepLabV3, we apply the improved residual structure and transformer as a backbone to fully extract image features locally and globally. The ECA attention module is combined to enhance channel attention, strengthen useful information, and weaken the interference of useless information. RTC_TongueNet model can effectively segment tongue images. This study has practical application value and reference value for tongue image segmentation.

Reducing Tongue Shape Dimensionality from Hundreds of Available Resources Using Autoencoder

Exploiting SDAE Model for Recommendations

Auto-Encoder Based Dimensionality Reduction

Dimensionality Reduction Strategy Based on Auto-Encoder

Denoising convolutional autoencoder based B-mode ultrasound tongue image feature extraction

Automatic Tongue Crack Extraction For Real-Time Diagnosis

Tongue contour extraction from ultrasound images based on deep neural network

Tongue feature recognition to monitor rehabilitation: deep neural network with visual attention mechanism

A Hybrid Autoencoder Framework of Dimensionality Reduction for Brain-Computer Interface Decoding

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

RTC_TongueNet: An improved tongue image segmentation model based on DeepLabV3

DAFT-Net: Dual Attention and Fast Tongue Contour Extraction Using Enhanced U-Net Architecture

Tongue shape conversion with non-parallel training data

Deep-DSP: deep convolutional network with double spatial pyramid for tongue image segmentation

Deep Upscale U-Net for automatic tongue segmentation

TISNet-Enhanced Fully Convolutional Network with Encoder-Decoder Structure for Tongue Image Segmentation in Traditional Chinese Medicine

$ε$-VAE: Denoising as Visual Decoding

A Speech-Driven 3-D Tongue Model with Realistic Movement in Mandarin Chinese.

Speaker-independent Lips and Tongue Visualization of Vowels

Automatic Tongue Delineation from MRI Images with a Convolutional Neural Network Approach