Frame-to-video-based Semi-supervised Lung Ultrasound Scoring Model

Wenyu Xing,Yiwen Liu,Chao He,Xin Liu,Yifang Li,Wenfang Li,Jiangang Chen,Dean Ta
DOI: https://doi.org/10.1109/IUS51837.2023.10307376
2023-01-01
Abstract:Lung ultrasound (LUS) is a valuable tool for bedside real-time monitoring of critically ill patients with respiratory diseases. Previous studies primarily focused on LUS frame level (FL), while this paper proposed a LUS video-level (VL) scoring model. Firstly, LUS frames were scored by experienced clinicians and employed for the LUS-FL scoring model. This model was designed based on the dual attention vision transformer (DaViT), which uses hybrid self-attention mechanisms to extract local and global features of LUS frames. Secondly, each LUS video was represented by 40 selected sequential keyframes at 5 intervals, and all keyframes were scored by the trained DaViT-based LUS-FL scoring model. The sum of the 40 LUS frames’ score was used to describe the original LUS video and divided it into mild, moderate, and severe. Thirdly, the one-to-one video and label was employed for building the novel LUS-VL scoring model which employed a 40-channel input with a patch embedding layer. Meanwhile, the DaViT’s parameters in each channel were transferred from previous LUS-FL scoring model. After that, the correlation analysis of 40-channel output was achieved using the long-short term memory (LSTM) module for the video scoring task, with a final MLP head. A total 4000 frames and 1027 videos were used for training and testing the frame-to-video-based scoring model through 5-fold cross validation. Experimental results demonstrate that the proposed model performs well in LUS scoring, with accuracy of 95.08% and 92.59% in FL and VL, respectively.
What problem does this paper attempt to address?