SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals

Rahul Thapa,Bryan He,Magnus Ruud Kjaer,Hyatt Moore,Gauri Ganjoo,Emmanuel Mignot,James Zou
2024-05-28
Abstract:Sleep is a complex physiological process evaluated through various modalities recording electrical brain, cardiac, and respiratory activities. We curate a large polysomnography dataset from over 14,000 participants comprising over 100,000 hours of multi-modal sleep recordings. Leveraging this extensive dataset, we developed SleepFM, the first multi-modal foundation model for sleep analysis. We show that a novel leave-one-out approach for contrastive learning significantly improves downstream task performance compared to representations from standard pairwise contrastive learning. A logistic regression model trained on SleepFM's learned embeddings outperforms an end-to-end trained convolutional neural network (CNN) on sleep stage classification (macro AUROC 0.88 vs 0.72 and macro AUPRC 0.72 vs 0.48) and sleep disordered breathing detection (AUROC 0.85 vs 0.69 and AUPRC 0.77 vs 0.61). Notably, the learned embeddings achieve 48% top-1 average accuracy in retrieving the corresponding recording clips of other modalities from 90,000 candidates. This work demonstrates the value of holistic multi-modal sleep modeling to fully capture the richness of sleep recordings. SleepFM is open source and available at <a class="link-external link-https" href="https://github.com/rthapa84/sleepfm-codebase" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Signal Processing
What problem does this paper attempt to address?
The paper aims to address several key issues in sleep data analysis, particularly in tasks such as improving sleep stage classification and Sleep Disordered Breathing (SDB) detection using multimodal representation learning. Specifically, the research team developed a multimodal foundation model named SleepFM to integrate various physiological data such as Brain Activity Signals (BAS), Electrocardiogram (ECG), and respiratory signals to enhance the accuracy of automatic sleep record analysis. The main objectives include: 1. **Developing SleepFM**: A multimodal foundation model based on Contrastive Learning (CL), trained on a large-scale Polysomnography (PSG) dataset, aiming to capture the synergistic effects between different modalities to learn more robust physiological representations. 2. **Proposing a new contrastive learning method**: Introducing a novel "Leave-One-Out" contrastive learning strategy, which significantly improves downstream task performance by contrasting the embedding of one modality with the average embedding of the remaining modalities. 3. **Evaluating the effectiveness of SleepFM**: Validating the advantages of SleepFM over end-to-end trained Convolutional Neural Network (CNN) models through various downstream tasks (such as sleep stage classification and SDB detection) and demonstrating SleepFM's superior performance in these tasks. 4. **Exploring the model's generalization ability**: Conducting tasks such as age prediction, gender classification, and retrieval analysis experiments to further prove the quality of SleepFM embeddings. 5. **Conducting few-shot evaluation**: Assessing the model's performance on limited datasets to understand its effectiveness in scenarios with potential sample scarcity in real-world applications. In summary, SleepFM aims to advance the diagnosis and monitoring technologies for sleep-related diseases by leveraging the unique advantages of multimodal data, thereby improving the accuracy and efficiency of automatic analysis.