Abstract:This paper introduces the Efficient Decoupled Masked Autoencoder (EDMAE), a novel self-supervised method for recognizing standard views in pediatric echocardiography. EDMAE introduces a new proxy task based on the encoder-decoder structure. The EDMAE encoder is composed of a teacher and a student encoder. The teacher encoder extracts the potential representation of the masked image blocks, while the student encoder extracts the potential representation of the visible image blocks. The loss is calculated between the feature maps output by the two encoders to ensure consistency in the latent representations they extract. EDMAE uses pure convolution operations instead of the ViT structure in the MAE encoder. This improves training efficiency and convergence speed. EDMAE is pre-trained on a large-scale private dataset of pediatric echocardiography using self-supervised learning, and then fine-tuned for standard view recognition. The proposed method achieves high classification accuracy in 27 standard views of pediatric echocardiography. To further verify the effectiveness of the proposed method, the authors perform another downstream task of cardiac ultrasound segmentation on the public dataset CAMUS. The experimental results demonstrate that the proposed method outperforms some popular supervised and recent self-supervised methods, and is more competitive on different downstream tasks.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the problem of standard view recognition in pediatric echocardiography. Specifically, the authors propose a novel self-supervised method named **Efficient Decoupled Masked Autoencoder (EDMAE)** for recognizing standard views in pediatric echocardiography. ### Background and Motivation Congenital heart diseases (CHDs) are among the most common birth defects, with approximately 100,000 to 150,000 newborns diagnosed with CHD each year in China. Early and accurate diagnosis is of great clinical significance. Transthoracic echocardiography (TTE) is a cost-effective, non-invasive, and radiation-free imaging technique that can visualize the heart in real-time dynamically. TTE has become an important tool for diagnosing and treating CHD because it can quickly detect various cardiac abnormalities. However, due to the complexity of congenital heart diseases and the diversity of spatial configurations, accurate diagnosis through TTE is time-consuming and requires experienced professionals. ### Research Objectives 1. **Automatic Recognition of Standard Views**: Develop a method that can automatically recognize standard views in pediatric echocardiography to improve diagnostic efficiency and accuracy. 2. **Reduce the Need for Labeled Data**: Utilize self-supervised learning methods to reduce the reliance on large amounts of labeled data, thereby lowering the cost and time of data annotation. 3. **Improve Model Performance**: Design an efficient decoupled masked autoencoder (EDMAE) to enhance the model's performance in various downstream tasks, such as standard view recognition and cardiac ultrasound segmentation. ### Method Overview 1. **EDMAE Model**: - **Decoupled Structure**: EDMAE employs two identical encoders, one as the teacher encoder and the other as the student encoder. The teacher encoder processes visible image patches and updates weights, while the student encoder processes masked image patches, with its weights updated from the teacher encoder. - **Feature Alignment**: By calculating the loss between the feature maps output by the two encoders, it ensures consistent representation of masked and visible image patches. - **Convolution Operations**: Pure convolution operations are used instead of the ViT structure, improving training efficiency and convergence speed. 2. **Self-Supervised Pretraining**: - Conduct self-supervised pretraining on a large-scale unlabeled pediatric cardiac ultrasound image dataset. - Use a 75% masking rate to generate high-quality reconstructed images by optimizing network parameters. 3. **Downstream Tasks**: - **Standard View Recognition**: Replace the decoder with a linear layer and fine-tune using the cross-entropy loss function. - **Cardiac Ultrasound Segmentation**: Use the decoder of DenseUNet as the segmentation head to output segmentation results and calculate the loss using Focal Loss. ### Experimental Results 1. **Experiments on Private Dataset**: - On a private dataset containing 27 standard views, the EDMAE method outperformed other mainstream classification networks and self-supervised methods in terms of overall accuracy, precision, recall, specificity, and F1 score. - Specific metrics are as follows: - Overall Accuracy: 98.48% - Average Precision: 93.20% - Average Recall: 94.62% - Average Specificity: 99.73% - Average F1 Score: 93.63% 2. **Experiments on Public Dataset CAMUS**: - The effectiveness of the EDMAE method was validated on the CAMUS dataset, showing excellent performance in the cardiac ultrasound segmentation task. ### Conclusion The EDMAE method proposed in this paper achieved significant performance improvements in the task of standard view recognition in pediatric echocardiography and also demonstrated competitive performance in the cardiac ultrasound segmentation task. This method not only reduces the reliance on large amounts of labeled data but also improves training efficiency and model performance.

EDMAE: An Efficient Decoupled Masked Autoencoder for Standard View Identification in Pediatric Echocardiography

A Deep Learning Based Approach for Automatic Cardiac Events Identification

GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds

A Deep Learning-Based Method for Pediatric Congenital Heart Disease Detection with Seven Standard Views in Echocardiography

Medical supervised masked autoencoders: Crafting a better masking strategy and efficient fine-tuning schedule for medical image classification

A multi-task deep learning approach for real-time view classification and quality assessment of echocardiographic images

Echo-Vision-FM: A Pre-training and Fine-tuning Framework for Echocardiogram Videos Vision Foundation Model

SdAE: Self-distillated Masked Autoencoder

Advancing Volumetric Medical Image Segmentation via Global-Local Masked Autoencoder

Towards Accurate Cardiac MRI Segmentation with Variational Autoencoder-Based Unsupervised Domain Adaptation

Toward Accurate Cardiac MRI Segmentation With Variational Autoencoder-Based Unsupervised Domain Adaptation

Unsupervised Pre-Training Using Masked Autoencoders for ECG Analysis

Deblurring Masked Autoencoder is Better Recipe for Ultrasound Image Recognition

Wave masked Autoencoder: an electrocardiogram signal diagnosis model based on wave making strategy

Standard Echocardiographic View Recognition in Diagnosis of Congenital Heart Defects in Children Using Deep Learning Based on Knowledge Distillation

Self Pre-training with Masked Autoencoders for Medical Image Classification and Segmentation

A Task-Generic High-Performance Unsupervised Pre-Training Framework for ECG

Multi-Channel Masked Autoencoder and Comprehensive Evaluations for Reconstructing 12-Lead ECG from Arbitrary Single-Lead ECG

Real-Time Automatic M-Mode Echocardiography Measurement With Panel Attention

Efficient Masked Autoencoders with Self-Consistency