wav2sleep: A Unified Multi-Modal Approach to Sleep Stage Classification from Physiological Signals

Jonathan F. Carter,Lionel Tarassenko
2024-11-07
Abstract:Accurate classification of sleep stages from less obtrusive sensor measurements such as the electrocardiogram (ECG) or photoplethysmogram (PPG) could enable important applications in sleep medicine. Existing approaches to this problem have typically used deep learning models designed and trained to operate on one or more specific input signals. However, the datasets used to develop these models often do not contain the same sets of input signals. Some signals, particularly PPG, are much less prevalent than others, and this has previously been addressed with techniques such as transfer learning. Additionally, only training on one or more fixed modalities precludes cross-modal information transfer from other sources, which has proved valuable in other problem domains. To address this, we introduce wav2sleep, a unified model designed to operate on variable sets of input signals during training and inference. After jointly training on over 10,000 overnight recordings from six publicly available polysomnography datasets, including SHHS and MESA, wav2sleep outperforms existing sleep stage classification models across test-time input combinations including ECG, PPG, and respiratory signals.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to accurately classify sleep stages through less - invasive sensor measurements (such as electrocardiogram (ECG) or photoplethysmogram (PPG)), thereby achieving important applications in sleep medicine. Existing methods usually use deep - learning models designed and trained for one or more specific input signals, but the data sets used in the development of these models often do not contain the same set of input signals. In particular, PPG signals are much less common than other signals, which was previously addressed through techniques such as transfer learning. Moreover, training only on one or more fixed modalities excludes cross - modal information transfer from other sources, which has proven valuable in other problem areas. To address these issues, the paper introduces wav2sleep, a unified model designed to handle a variable number of input signals during training and inference. After joint training on more than 10,000 overnight recordings from six publicly available polysomnography data sets (including SHHS and MESA), wav2sleep outperforms existing sleep - stage classification models on test - time input combinations (including ECG, PPG, and respiratory signals). Specifically, this research aims to: 1. **Improve the accuracy of sleep - stage classification**: By leveraging multiple physiological signals, reduce the limitations brought by a single signal and improve the accuracy and robustness of classification. 2. **Solve the data - set heterogeneity problem**: The available signals in different data sets are different, and many signals are discarded due to poor quality. Adapt to this heterogeneity by jointly training the model. 3. **Achieve cross - modal information transfer**: Through joint training, enable the model to extract and transfer useful information from different modalities and improve overall performance. 4. **Simplify model deployment**: Through a unified model, reduce the operational complexity in practical applications. Only one model needs to be trained, validated, and deployed. Through these goals, wav2sleep aims to provide a more flexible and powerful solution to address the challenges in sleep - stage classification.