DAFT-Net: Dual Attention and Fast Tongue Contour Extraction Using Enhanced U-Net Architecture

Xinqiang Wang,Wenhuan Lu,Hengxin Liu,Wei Zhang,Qiang Li
DOI: https://doi.org/10.3390/e26060482
IF: 2.738
2024-05-31
Entropy
Abstract:In most silent speech research, continuously observing tongue movements is crucial, thus requiring the use of ultrasound to extract tongue contours. Precisely and in real-time extracting ultrasonic tongue contours presents a major challenge. To tackle this challenge, the novel end-to-end lightweight network DAFT-Net is introduced for ultrasonic tongue contour extraction. Integrating the Convolutional Block Attention Module (CBAM) and Attention Gate (AG) module with entropy-based optimization strategies, DAFT-Net establishes a comprehensive attention mechanism with dual functionality. This innovative approach enhances feature representation by replacing traditional skip connection architecture, thus leveraging entropy and information-theoretic measures to ensure efficient and precise feature selection. Additionally, the U-Net's encoder and decoder layers have been streamlined to reduce computational demands. This process is further supported by information theory, thus guiding the reduction without compromising the network's ability to capture and utilize critical information. Ablation studies confirm the efficacy of the integrated attention module and its components. The comparative analysis of the NS, TGU, and TIMIT datasets shows that DAFT-Net efficiently extracts relevant features, and it significantly reduces extraction time. These findings demonstrate the practical advantages of applying entropy and information theory principles. This approach improves the performance of medical image segmentation networks, thus paving the way for real-world applications.
physics, multidisciplinary
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing the problem of accurately and real-time extracting tongue contours in ultrasound images, which is crucial for silent speech research. Specifically, the paper proposes a novel end-to-end lightweight network—DAFT-Net (Dual Attention and Fast Tongue Contour Extraction using Enhanced U-Net Architecture) for extracting tongue contours from ultrasound images. The main challenges mentioned in the paper include: 1. **Gaussian speckle noise**: Commonly present in ultrasound tongue images, affecting the clarity of the tongue contour. 2. **Structural occlusion**: Such as the hyoid bone and jawbone may block the propagation of ultrasound waves, leading to a decline in image quality. 3. **Low reflectivity**: The low reflectivity of tongue muscle fibers results in incomplete echo paths, thereby affecting the continuity and integrity of the contour. 4. **Image artifacts**: Image artifacts generated by the soft tissue of the tongue changing positions make the contour difficult to identify. 5. **Manual initialization requirement**: Early methods required manual marking for initialization, limiting real-time tracking capabilities. 6. **Data quality and algorithm selection**: The accuracy of tongue contours highly depends on the quality of ultrasound data and the chosen contour tracking algorithm. 7. **Speed limitations of semi-automatic or manual methods**: Current methods are mostly semi-automatic or manual, limiting extraction speed. 8. **Computational resource consumption**: As network parameters increase and input image resolution improves, computational demands rise sharply, affecting real-time processing capabilities. To address the above issues, the proposed DAFT-Net network has the following features: - **Simplified design**: Reducing the number of convolutional layers in each encoder and decoder block to lower computational demands. - **Integrated attention modules**: Combining Convolutional Block Attention Module (CBAM) and Attention Gate (AG) modules to establish a comprehensive dual-function attention mechanism, replacing traditional skip connections, thereby enhancing feature representation. - **Information theory optimization**: Utilizing entropy and information theory principles to optimize feature selection, ensuring efficient and accurate selection of key features. - **Experimental validation**: Through comparative analysis on three ultrasound datasets (NS, TJU, and TIMIT), it is demonstrated that DAFT-Net can effectively extract relevant features and significantly reduce extraction time. In summary, DAFT-Net aims to overcome the limitations of existing methods by introducing innovative technical means, achieving faster and more accurate ultrasound tongue contour extraction, and providing support for silent speech interfaces and other related applications.