A Multimodal Dataset of 21,412 Recorded Nights for Sleep and Respiratory Research

Alon Diament,Maria Gorodetski,Adam Jankelow,Ayya Keshet,Tal Shor,Daphna Weissglas-Volkov,Hagai Rossman,Eran Segal
2023-11-15
Abstract:This study introduces a novel, rich dataset obtained from home sleep apnea tests using the FDA-approved WatchPAT-300 device, collected from 7,077 participants over 21,412 nights. The dataset comprises three levels of sleep data: raw multi-channel time-series from sensors, annotated sleep events, and computed summary statistics, which include 447 features related to sleep architecture, sleep apnea, and heart rate variability (HRV). We present reference values for Apnea/Hypopnea Index (AHI), sleep efficiency, Wake After Sleep Onset (WASO), and HRV sample entropy, stratified by age and sex. Moreover, we demonstrate that the dataset improves the predictive capability for various health related traits, including body composition, bone density, blood sugar levels and cardiovascular health. These results illustrate the dataset's potential to advance sleep research, personalized healthcare, and machine learning applications in biomedicine.
Machine Learning,Signal Processing
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper presents a new and rich data set, which is obtained from home sleep tests of 7,077 participants over 21,412 nights through the FDA - approved WatchPAT - 300 device. The data set contains three levels of sleep data: raw multi - channel time series from sensors, annotated sleep events, and calculated summary statistics, including 447 features related to sleep architecture, sleep apnea, and heart rate variability (HRV). **Main problems**: 1. **Construction and description of the data set**: - Construct a multi - modal data set containing a large amount of sleep and respiration data. - Provide reference values for key features, such as the apnea - hypopnea index (AHI), sleep efficiency, wake after sleep onset (WASO), and HRV sample entropy, stratified by age and gender. 2. **Clinical applications of the data set**: - Demonstrate the potential of the data set in predicting various health - related features, including body composition, bone density, blood glucose level, and cardiovascular health. - Verify the clinical utility of the data set through prediction models, especially in metabolic and cardiovascular phenotypes. 3. **Potential impacts of the data set**: - Promote sleep research, personalized medicine, and machine - learning applications in biomedicine. - Provide high - quality data resources for future research, especially for studying age - related health changes. ### Detailed explanations of specific problems 1. **Construction and description of the data set**: - **Data sources**: Use the FDA - approved WatchPAT - 300 device to collect data from home sleep tests of 7,077 participants over 21,412 nights. - **Data levels**: - **Raw multi - channel time series**: Including 12 original, processed, and derived channels, such as actogram, heart rate, pulse oximetry (SpO2), etc. - **Annotated sleep events**: Such as sleep stages, classified apnea events, heart rate variability events, and oxygen desaturation events. - **Summary statistics**: Including 447 features related to sleep architecture, sleep apnea, and heart rate variability. - **Reference values**: Provide reference values for AHI, sleep efficiency, WASO, and HRV sample entropy, stratified by age and gender. 2. **Clinical applications of the data set**: - **Prediction models**: Use prediction models to demonstrate the potential of the data set in predicting various health - related features. - **Feature groups**: Identify three different sleep - derived feature groups: - **Sleep architecture and quality**: Including indicators such as sleep stage duration, sleep efficiency, and sleep fragmentation. - **Respiration and sleep apnea**: Including variables such as AHI, respiratory disturbance index (RDI), and oxygen desaturation index (ODI). - **Heart rate features**: Mainly including HRV indicators. - **Prediction effects**: Evaluate the performance differences between the feature - group models and the baseline model by mean squared error (MAE) and correlation, and conduct statistical significance tests. 3. **Potential impacts of the data set**: - **Promote research**: The high quality and richness of the data set provide an important resource for future sleep research, personalized medicine, and machine - learning applications in biomedicine. - **Interdisciplinary applications**: The data set is applicable not only to sleep research but also to research in multiple fields such as metabolism and cardiovascular health. ### Conclusion This paper constructs a multi - modal data set containing a large amount of sleep and respiration data, demonstrates its potential in predicting various health - related features, and provides an important data resource for future sleep research and clinical applications.