Explainable text-tabular models for predicting mortality risk in companion animals

James Burton,Sean Farrell,Peter-John Mäntylä Noble,Noura Al Moubayed
DOI: https://doi.org/10.1038/s41598-024-64551-1
IF: 4.6
2024-06-22
Scientific Reports
Abstract:As interest in using machine learning models to support clinical decision-making increases, explainability is an unequivocal priority for clinicians, researchers and regulators to comprehend and trust their results. With many clinical datasets containing a range of modalities, from the free-text of clinician notes to structured tabular data entries, there is a need for frameworks capable of providing comprehensive explanation values across diverse modalities. Here, we present a multimodal masking framework to extend the reach of SHapley Additive exPlanations (SHAP) to text and tabular datasets to identify risk factors for companion animal mortality in first-opinion veterinary electronic health records (EHRs) from across the United Kingdom. The framework is designed to treat each modality consistently, ensuring uniform and consistent treatment of features and thereby fostering predictability in unimodal and multimodal contexts. We present five multimodality approaches, with the best-performing method utilising PetBERT, a language model pre-trained on a veterinary dataset. Utilising our framework, we shed light for the first time on the reasons each model makes its decision and identify the inclination of PetBERT towards a more pronounced engagement with free-text narratives compared to BERT-base's predominant emphasis on tabular data. The investigation also explores the important features on a more granular level, identifying distinct words and phrases that substantially influenced an animal's life status prediction. PetBERT showcased a heightened ability to grasp phrases associated with veterinary clinical nomenclature, signalling the productivity of additional pre-training of language models.
multidisciplinary sciences
What problem does this paper attempt to address?
This paper aims to address the issue of interpretability when using machine learning models in veterinary clinical decision-making, specifically in the context of predicting the risk of companion animal death. The current challenge lies in the lack of interpretability of multimodal data, such as textual and tabular data, which limits the application of the models in understanding and trusting their results. The paper proposes a multimodal masking framework that extends the SHapley Additive exPlanations (SHAP) method to accommodate textual and tabular data for identifying the risk factors of companion animal death in the UK's primary opinion veterinary electronic health records. Through comparing different multimodal methods, the study finds that the PetBERT language model pretrained on veterinary datasets outperforms the BERT base model in understanding clinical terminology and free-text narratives. Additionally, the research reveals unique words and phrases that influence animal survival status, as well as the importance of features such as pet age, breed, and deprivation score in predicting death risk. The paper highlights the importance of interpretability systems in handling multimodal data to enhance healthcare professionals' trust in model predictions.