al-BERT: a semi-supervised denoising technique for disease prediction

Yun-Chien Tseng,Chuan-Wei Kuo,Wen-Chih Peng,Chih-Chieh Hung
DOI: https://doi.org/10.1186/s12911-024-02528-w
IF: 3.298
2024-05-17
BMC Medical Informatics and Decision Making
Abstract:Medical records are a valuable source for understanding patient health conditions. Doctors often use these records to assess health without solely depending on time-consuming and complex examinations. However, these records may not always be directly relevant to a patient's current health issue. For instance, information about common colds may not be relevant to a more specific health condition. While experienced doctors can effectively navigate through unnecessary details in medical records, this excess information presents a challenge for machine learning models in predicting diseases electronically. To address this, we have developed 'al-BERT', a new disease prediction model that leverages the BERT framework. This model is designed to identify crucial information from medical records and use it to predict diseases. 'al-BERT' operates on the principle that the structure of sentences in diagnostic records is similar to regular linguistic patterns. However, just as stuttering in speech can introduce 'noise' or irrelevant information, similar issues can arise in written records, complicating model training. To overcome this, 'al-BERT' incorporates a semi-supervised layer that filters out irrelevant data from patient visitation records. This process aims to refine the data, resulting in more reliable indicators for disease correlations and enhancing the model's predictive accuracy and utility in medical diagnostics.
medical informatics
What problem does this paper attempt to address?