Learning with incomplete labels of multisource datasets for ECG classification
Qince Li,Yang Liu,Ze Zhang,Jun Liu,Yongfeng Yuan,Kuanquan Wang,Runnan He
DOI: https://doi.org/10.1016/j.patcog.2024.110321
IF: 8
2024-06-01
Pattern Recognition
Abstract:The shortage of annotated ECG data presents a significant impediment, hampering the overall generalization capabilities of machine learning models tailored for automated ECG classification. The collective integration of multisource datasets presents a potential remedy for this challenge. However, it is crucial to underscore that the mere addition of supplementary data does not automatically guarantee performance enhancement, given the unresolved challenges associated with multisource data. In this research, we address one such challenge, namely, the issue of incomplete labels arising from the diversity of annotations within multi-source ECG datasets. First, we identified three distinct types of label missing: dataset-related label missing, supertype missing, and subtype missing. To address the supertype missing effectively, we introduce a novel approach known as offline category mapping which leverages the hierarchical relationships inherent within the categories to recover the missing supertype labels. Additionally, two complementary strategies, referred to as prediction masking and online category mapping, are proposed to mitigating the adverse effects of subtype and dataset-related label missing on model optimization. These strategies enhance the model's ability to identify missing subtypes under conditions of weak supervision. These pioneering methodologies are integrated into a deep learning-based framework designed for multilabel ECG classification. The performance of our proposed framework is rigorously evaluated using realistic multi-source datasets obtained from the PhysioNet/CinC challenge 2020/2021. The proposed learning framework exhibits a notable improvement in macro-average precision, surpassing the corresponding baseline model by more than 25 % on the test datasets. As a result, this research study makes a substantial contribution to the field of ECG classification by addressing the critical issue of incomplete labels in multisource datasets, ultimately enhancing the generalization capabilities of machine learning models in this domain.
computer science, artificial intelligence,engineering, electrical & electronic