Metastatic vs. Localized Disease As Inclusion Criteria That Can Be Automatically Extracted From Randomized Controlled Trials Using Natural Language Processing

Paul Windisch,Fabio Dennstaedt,Carole Koechli,Robert Foerster,Christina Schroeder,Daniel Matthias Aebersold,Daniel Rudolf Zwahlen
DOI: https://doi.org/10.1101/2024.06.17.24309020
2024-06-17
Abstract:Background: Extracting inclusion and exclusion criteria in a structured, automated fashion remains a challenge to developing better search functionalities or automating systematic reviews of randomized controlled trials in oncology. The question 'Did this trial enroll patients with localized disease, metastatic disease, or both?' could be used to narrow down the number of potentially relevant trials when conducting a search. Methods: 600 trials from high-impact medical journals were classified depending on whether they allowed for the inclusion of patients with localized and/or metastatic disease. 500 trials were used to develop and validate three different models with 100 trials being stored away for testing. Results: On the test set, a rule-based system using regular expressions achieved an F1-score of 0.72 (95% CI: 0.64 - 0.81) for the prediction of whether the trial allowed for the inclusion of patients with localized disease and 0.77 (95% CI: 0.69 - 0.85) for metastatic disease. A transformer-based machine learning model achieved F1 scores of 0.97 (95% CI: 0.93 - 1.00) and 0.88 (95% CI: 0.82 - 0.94), respectively. The best performance was achieved by a combined approach where the rule-based system was allowed to overrule the machine learning model with F1 scores of 0.97 (95% CI: 0.94 - 1.00) and 0.89 (95% CI: 0.83 - 0.95), respectively. Conclusion: Automatic classification of cancer trials with regard to the inclusion of patients with localized and or metastatic disease is feasible. Turning the extraction of trial criteria into classification problems could, in selected cases, improve text-mining approaches in evidence-based medicine.
Oncology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenge of automatically extracting inclusion and exclusion criteria in randomized controlled trials (RCTs), especially for trials in the field of oncology. Specifically, the researchers are concerned with how to automatically identify and classify through natural language processing techniques (NLP) whether clinical trials allow the inclusion of patients with local diseases or metastatic diseases. The solution to this problem can improve the literature retrieval function, simplify the automated process of systematic reviews, and help quickly screen out trials related to specific clinical questions. The key question in the paper is: "Does this trial recruit patients with local diseases, metastatic diseases, or both?" This question can be answered by automatic classification techniques, thereby reducing the number of potentially relevant trials when conducting literature searches. The researchers hypothesize that transforming the extraction of trial criteria into a classification problem can, in some cases, improve text - mining methods in evidence - based medicine.