Quantifying the Impact of Data Characteristics on the Transferability of Sleep Stage Scoring Models

Akara Supratak,Peter Haddawy
DOI: https://doi.org/10.1016/j.artmed.2023.102540
2023-03-28
Abstract:Deep learning models for scoring sleep stages based on single-channel EEG have been proposed as a promising method for remote sleep monitoring. However, applying these models to new datasets, particularly from wearable devices, raises two questions. First, when annotations on a target dataset are unavailable, which different data characteristics affect the sleep stage scoring performance the most and by how much? Second, when annotations are available, which dataset should be used as the source of transfer learning to optimize performance? In this paper, we propose a novel method for computationally quantifying the impact of different data characteristics on the transferability of deep learning models. Quantification is accomplished by training and evaluating two models with significant architectural differences, TinySleepNet and U-Time, under various transfer configurations in which the source and target datasets have different recording channels, recording environments, and subject conditions. For the first question, the environment had the highest impact on sleep stage scoring performance, with performance degrading by over 14% when sleep annotations were unavailable. For the second question, the most useful transfer sources for TinySleepNet and the U-Time models were MASS-SS1 and ISRUC-SG1, containing a high percentage of N1 (the rarest sleep stage) relative to the others. The frontal and central EEGs were preferred for TinySleepNet. The proposed approach enables full utilization of existing sleep datasets for training and planning model transfer to maximize the sleep stage scoring performance on a target problem when sleep annotations are limited or unavailable, supporting the realization of remote sleep monitoring.
Signal Processing,Machine Learning
What problem does this paper attempt to address?
The paper primarily aims to address two core issues: 1. **How significant is the impact of different data characteristics on sleep stage scoring performance?** When the target dataset lacks available annotations, which different data characteristics (such as recording channels, recording environment, and subject conditions) have the greatest impact on sleep stage scoring performance? Specifically, the study focuses on how these characteristics affect the model's performance in the absence of target dataset annotations. 2. **When there are a few annotations available for the target dataset, from which dataset should transfer learning be performed to optimize performance?** That is, when the target dataset has some annotated data but not enough to train a model, which existing dataset is the most suitable source for transfer learning to maximize the model's performance on the target dataset? To address the above issues, the authors propose a novel method to quantify the impact of different data characteristics on the transferability of deep learning models and quantify the transferability by fine-tuning the models under different settings. They used two models with significant architectural differences—TinySleepNet and U-Time—and conducted experiments on datasets with different recording channels, recording environments, and subject conditions. - For the first issue, the study found that different recording environments have the greatest impact on sleep stage scoring performance, with performance dropping by more than 14% when sleep annotations are lacking. - For the second issue, the study identified that the best transfer source datasets for the TinySleepNet and U-Time models are MASS-SS1 and ISRUC-SG1. These two datasets contain a higher proportion of N1 sleep stages (the rarest sleep stage), and for the TinySleepNet model, EEG signals from the frontal and central brain regions are more preferable. Through this method, researchers can utilize existing sleep datasets for training and plan model transfer to achieve optimal sleep stage scoring performance in the target problem, especially when sleep annotations are limited or unavailable. This helps to advance the development of remote sleep monitoring technology.