Accurate Apnea and Hypopnea Localization in PSG with Multi-scale Object Detection Via Dual-modal Feature Learning

Yifeng Ji,Dan Chen,Yiping Zuo,Tengfei Gao,Yunbo Tang
DOI: https://doi.org/10.1016/j.bspc.2023.105717
IF: 5.1
2024-01-01
Biomedical Signal Processing and Control
Abstract:Localization of sleep apnea and hypopnea (SAH) events has routinely relied on expert visual inspection of polysomnography (PSG) recordings, which is a tedious task demanding a high level of professional skills. Automated detection methods have achieved remarkable success, especially with the recent advances in machine learning and deep learning technologies. However, a significant challenge still remains in methods towards clinical practices: How to accurately discriminate SAH events in PSG, with the onset and duration of each? This study develops an object detection framework for accurately identifying the position of SAH segments with varied durations (namely SAH-MOD) in three phases: (1) Dual-modal Feature Learning (DFL, dual-branch 1-D convolutional layers followed by Concatenate Block). Deep features are efficiently learned and then fused from two different types of signals related to respiratory, i.e., nasal airflow and abdominal movement; (2) Feature Map Generation (FMG, cascade 1-D convolutional layers). Feature maps are generated with multi-scale hierarchical features in different depths of the network, catering for the needs of object (SAH event) detection; Default anchors associated with the scales and receptive fields are tiled onto the corresponding detection feature maps; and (3) Multi-scale Object Detection (MOD). A variety of instances of prediction are then made on all available detection layers with post-processing to accurately capture each SAH event. Experiments have been performed on the dataset of stroke unit recordings for the detection of Obstructive Sleep Apnea Syndrome (OSASUD-dataset) with SAH-MOD against the state-of-the-art counterparts, and results indicate that: (1) SAH-MOD performs the best with a Recall of 81.0% and an F1-score of 71.1%; and (2) it has significant advantages in localizing the onset and duration of each SAH event, with 91.9% of the IoU values between predicted and labeled events falling between 0.6 and 1.0. Ablation experiments show that the introduction of dual-modal feature learning and hierarchical feature maps improves recall performance by 6.9% and 4.1%, respectively.
What problem does this paper attempt to address?