Abstract:Background Despite the unprecedented performance of deep learning algorithms in clinical domains, full reviews of algorithmic predictions by human experts remain mandatory. Under these circumstances, artificial intelligence (AI) models are primarily designed as clinical decision support systems (CDSSs). However, from the perspective of clinical practitioners, the lack of clinical interpretability and user-centered interfaces hinders the adoption of these AI systems in practice. Objective This study aims to develop an AI-based CDSS for assisting polysomnographic technicians in reviewing AI-predicted sleep staging results. This study proposed and evaluated a CDSS that provides clinically sound explanations for AI predictions in a user-centered manner. Methods Our study is based on a user-centered design framework for developing explanations in a CDSS that identifies why explanations are needed, what information should be contained in explanations, and how explanations can be provided in the CDSS. We conducted user interviews, user observation sessions, and an iterative design process to identify three key aspects for designing explanations in the CDSS. After constructing the CDSS, the tool was evaluated to investigate how the CDSS explanations helped technicians. We measured the accuracy of sleep staging and interrater reliability with macro-F1 and Cohen κ scores to assess quantitative improvements after our tool was adopted. We assessed qualitative improvements through participant interviews that established how participants perceived and used the tool. Results The user study revealed that technicians desire explanations that are relevant to key electroencephalogram (EEG) patterns for sleep staging when assessing the correctness of AI predictions. Here, technicians wanted explanations that could be used to evaluate whether the AI models properly locate and use these patterns during prediction. On the basis of this, information that is closely related to sleep EEG patterns was formulated for the AI models. In the iterative design phase, we developed a different visualization strategy for each pattern based on how technicians interpreted the EEG recordings with these patterns during their workflows. Our evaluation study on 9 polysomnographic technicians quantitatively and qualitatively investigated the helpfulness of the tool. For technicians with <5 years of work experience, their quantitative sleep staging performance improved significantly from 56.75 to 60.59 with a P value of .05. Qualitatively, participants reported that the information provided effectively supported them, and they could develop notable adoption strategies for the tool. Conclusions Our findings indicate that formulating clinical explanations for automated predictions using the information in the AI with a user-centered design process is an effective strategy for developing a CDSS for sleep staging.

Automated sleep stage scoring employing a reasoning mechanism and evaluation of its explainability

Automatic Classification of Sleep Stages Based on Raw Single-Channel EEG

An ensemble method for improving robustness against the electrode contact problems in automated sleep stage scoring

Automatic sleep stage scoring based on waveform recognition method and decision-tree learning

Current status and prospects of automatic sleep stages scoring: Review

Automated multi-model deep neural network for sleep stage scoring with unfiltered clinical data

Sleep Staging Framework with Physiologically Harmonized Sub-Networks

Refining sleep staging accuracy: Transfer learning coupled with scorability models

Large-Scale Automated Sleep Staging

DeepSleepNet: A Model for Automatic Sleep Stage Scoring Based on Raw Single-Channel EEG

Personalizing deep learning models for automatic sleep staging

Transparency in Sleep Staging: Deep Learning Method for EEG Sleep Stage Classification with Model Interpretability

DeepSleepNet-Lite: A Simplified Automatic Sleep Stage Scoring Model with Uncertainty Estimates

An explainable deep-learning model to stage sleep states in children and propose novel EEG-related patterns in sleep apnea

0662 Accurate Automated Sleep Staging of Narcoleptic Patients Using a Machine Learning Model

Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring

A Clinical Decision Support System for Sleep Staging Tasks With Explanations From Artificial Intelligence: User-Centered Design and Evaluation Study

Automatic Sleep Stage Scoring Using Time-Frequency Analysis and Stacked Sparse Autoencoders

A Multi-Level Interpretable Sleep Stage Scoring System by Infusing Experts' Knowledge Into a Deep Network Architecture

Automated scoring of pre-REM sleep in mice with deep learning

Do Not Sleep on Traditional Machine Learning: Simple and Interpretable Techniques Are Competitive to Deep Learning for Sleep Scoring