Occupation Recognition and Exploitation in Rheumatology Clinical Notes: Employing Deep Learning Models for Named Entity Recognition and Knowledge Discovery in Electronic Health Records

Alfredo Madrid-García,Inés Pérez-Sancristóbal,Leticia-Leon,Lydia-Abásolo,Benjamín Fernández-Gutiérrez,Luis Rodríguez-Rodríguez
DOI: https://doi.org/10.1101/2024.05.08.24306389
2024-05-09
Abstract:Occupation is considered a Social Determinant of Health (SDOH) and its effects have been studied at multiple levels. Although the inclusion of such data in the Electronic Health Record (EHR) is vital for the provision of clinical care, specially in rheumatology where work disability prevention is essential, occupation information is often either not routinely documented or captured in an unstructured manner within conventional EHR systems. Encouraged by recent advances in natural language processing and deep learning models, we propose the use of novel architectures (i.e., transformers) to detect occupation mentions in rheumatology clinical notes of a tertiary hospital, and to whom those occupations belongs. We also aimed to evaluate the clinical and demographic characteristics that influence the collection of this SDOH; and the association between occupation and patients’ diagnosis. Bivariate and multivariate logistic regression analysis were conducted for this purpose. A Spanish pre-trained language model, RoBERTa, fine-tuned with biomedical texts was used to detect occupations. The best model achieved a F1-score of 0.725 identifying occupation mentions. Moreover, highly disabling mechanical pathology diagnoses (i.e., back pain, muscle disorders) were associated with a higher probability of occupation collection. Ultimately, we determined the professions most closely associated with more than ten categories of muscu-loskeletal disorders.
Rheumatology
What problem does this paper attempt to address?